Data Management Research Center for Human-centered, Efficient, and Scalable Systems
Welcome to UtahDB, the Data Management Lab at the Kahlert School of Computing, the University of Utah. We are situated in the beautiful Salt Lake City. Our research focus is on designing and developing human-centered, efficient, and scalable data-management systems. Check out this overview presentation to know more about our lab's current projects and vision. Our lab consists of an amazing group of people and we are growing!
We are looking for a number of motivated PhD students for Fall 2026. Apply today!
Algorithms for Big Data Analytics: Geometric Data Analysis, Computational Geometry, Coresets and Sketches, Handling Uncertainty, Data Mining, Databases, Machine Learning, Spatial Statistics.
Data preparation, data discovery, data debugging, data integration, user interfaces, information extraction, data quality, data cleaning, and database usability
Data systems usability, Data summarization, Trusted machine learning, Explainable AI, Data exploration and user interfaces, Data quality, Data cleaning, Data debugging, Responsible data management, Data fairness
Democratizing data-driven systems: This project focuses on three key aspects of data system democratization: enhancing usability of data systems for non-experts and experts, providing explanation frameworks to enable understanding of system behavior, and achieving trust and fairness in machine learning.
Data structures for scalable computing: This project focuses on advancing the theory and practice of compact, dynamic, and scalable data structures to tackle the challenges of modern data analyses pipelines. We work on filters, hash tables, trees, succinct, and write-optimized data structures.
Large-scale indexing raw genomics data: This project focuses on building scalable data processing pipelines for quickly indexing and searching through tera-bytes of raw genomic, transcriptomic, and metagenomics data.
Efficient parallel graph processing: This project focuses on building highly parallel data structures and algorithms for efficiently processing static, streaming, and dynamic graphs. This project further explores using hardware accelerators such as GPUs for massively parallel processing of dynamic graphs.
Persistent Data Summaries: This project builds summaries for massive data arriving over time, which are small space, efficient to build and query, and amenable to data analysis. Moreover, they can be queried with respect to a time window for retrospective analysis.
Data Sketching: We design and implement sketch data structures which are compressed representations of data with guaranteed trade-offs between the space and the accuracy of queries. Our group has designs sketches for quantiles, multi-dimensional data, frequent items, shape-fitting, trajectories data, and many more.
Spatial Exposome Data: CEDaR is be an open exposomic data resource that can be used by researchers across disciplines to increase understanding of the environment and health. Sources of environmental exposure data are sparse, inconsistent, and rarely linked to individuals, making research complicated and difficult. Through CEDaR, we provide a single platform containing cleaned and standardized environmental exposure measures that can be used independently or to create holistic measures of the exposome.
Data Systems on Modern Hardware: This project exploits modern compute hardware such as GPUs, FPGAs and storage hardware such as PMEMs, HBMs for accelerating data systems. Our group designs new algorithmic techniques to model the performance of new hardware and then analyzes data systems in the light of the new algorithmic models to accelerate them.
© 2025 University of Utah. All Rights Reserved