Data Management Research Center for Human-centered, Efficient, and Scalable Systems

Welcome to the Data Management Lab at the School of Computing, University of Utah. We are situated in the beautiful Salt Lake City. Our research focus is on designing and developing human-centered, efficient, and scalable data-management systems. Our lab consists of an amazing group of people and we are growing!

We are looking for a number of motivated PhD students for Fall 2023. Apply today!

Faculty

Jeff M Phillips

Algorithms for Big Data Analytics: Geometric Data Analysis, Computational Geometry, Coresets and Sketches, Handling Uncertainty, Data Mining, Databases, Machine Learning, Spatial Statistics.

Prashant Pandey

Storage, indexing, and physical database design, Streams and complex event processing, Data platforms for emerging hardware/Emerging hardware for data management, Uncertain, probabilistic, and approximate databases

El Kindi Rezig

Data exploration, visualization, query languages, and user interfaces, Data integration, information extraction, and schema matching, Data quality, data cleaning, and database usability

New Faculty (Fall 23)

Data systems usability, Data summarization, Trusted machine learning, Explainable AI, Data exploration and user interfaces, Data quality, Data cleaning, Responsible data management, Data fairness

Research

Democratizing data-driven systems: This project focuses on three key aspects of data system democratization: enhancing usability of data systems for non-experts and experts, providing explanation frameworks to enable understanding of system behavior, and achieving trust and fairness in machine learning.

Data structures for scalable computing: This project focuses on advancing the theory and practice of compact, dynamic, and scalale data structures to tackle the challenges of modern data analyses pipelines. We work on filters, hash tables, trees, succinct, and write-optimized data structures.

Large-scale indexing raw genomics data: This project focuses on building scalable data processing pipelines for quickly indexing and searching through tera-bytes of raw genomic, transcriptomic, and metagenomics data.

Efficient parallel graph processing: This project focuses on building highly parallel data structures and algorithms for efficiently processing static, streaming, and dynamic graphs. This project further explores using hardware accelarators such as GPUs for massively parallel processing of dynamic graphs.

Persistent Data Summaries: This project builds summaries for massive data arriving over time, which are small space, efficient to build and query, and amenable to data analysis. Moreover, they can be queried with respect to a time window for retrospective analysis.

Data Sketching: We design and implement sketch data structures which are compressed representations of data with guaranteed trade-offs between the space and the accuracy of queries. Our group has designs sketches for quantiles, multi-dimensional data, frequent items, shape-fitting, trajectories data, and many more.

Spatial Exposome Data: CEDaR is be an open exposomic data resource that can be used by researchers across disciplines to increase understanding of the environment and health. Sources of environmental exposure data are sparse, inconsistent, and rarely linked to individuals, making research complicated and difficult. Through CEDaR, we provide a single platform containing cleaned and standardized environmental exposure measures that can be used independently or to create holistic measures of the exposome.

Data Systems on Modern Hardware: This project exploits modern compute hardware such as GPUs, FPGAs and storage hardware such as PMEMs, HBMs for accelarating data systems. Our group designs new algorighmic techniques to model the performance of new hardware and then analyzes data systems in the light of the new algoritmic models to accelarate them.

Students

© 2022 University of Utah. All Rights Reserved