I'm a Rackham Scholar and Computer Science & Engineering PhD student at the University of Michigan, advised by Professor Lin Ma. My research sits at the intersection of database management systems and machine learning.
Originally from the Bay Area, California.
I've been fortunate to be advised by Professor Lin Ma ever since my undergraduate years at Michigan. Working in his lab is what first drew me to the questions I care about most — and when it came time to choose a doctoral path, continuing my PhD under his guidance was an easy decision.
Today my work focuses on making database systems more robust and explainable as they become increasingly driven by machine learning — especially in the noisy, unpredictable environments real systems actually run in.
Learned knob tuning and configuration for self-optimizing data systems.
Resilient methodology and benchmarks for noisy, real-world conditions.
Optimizing index structures for LLMs — including hierarchical indexing for faster, more scalable retrieval.
Before my PhD, I was fortunate to build systems and machine learning across industry, from deep-learning libraries to autonomy and high-frequency data infrastructure.
Built autonomous-driving simulations, applying computer vision to model vehicle behavior.
Built conversational-AI models and improved CUDA-X AI libraries, boosting performance and reducing reported bugs across projects.
Developed autonomy algorithms in C++ and Unreal Engine and analyzed self-driving case data with deep-learning techniques.
Built and optimized large-scale data pipelines on Spark and Kafka, handling terabytes of market data daily with real-time streaming.
Built and maintained model pipelines and sampling for large-scale ML modeling infrastructure.
Researching robustness and explainability of ML-driven database tuning, with benchmarks built for noisy environments.
Modern data systems increasingly tune themselves with machine learning. My research asks what happens when those systems meet the messiness of the real world, and how we can keep them robust, explainable, and reliable.
Methodology and benchmarks that test the resilience of ML-based database tuning, improving throughput and cutting variance in noisy environments.
Identifying weaknesses in existing methods and raising system reliability through better analytics and prioritization strategies.
Enhanced ETL pipelines with GPU acceleration that reduce processing times and streamline evaluation over large-scale datasets.