Generative AI Systems
End-to-end orchestration of STT, RAG, and TTS workflows for real-time conversational assistants.
Exploring AI Systems, Stories, and Art.
I design and ship robust software systems for generative AI, LLM tooling, and AI infrastructure. My work blends research depth with production discipline.
I work across the stack from model behavior and retrieval pipelines to compiler and systems-level optimization. My interests include LLM quality, latency-aware architecture, and deployment-ready experimentation.
End-to-end orchestration of STT, RAG, and TTS workflows for real-time conversational assistants.
Designing practical evaluation signals and constrained decoding strategies for safer, higher-quality outputs.
Production-minded architecture with deterministic testing, observability, and low-latency delivery in mind.
End-to-end STT -> RAG -> TTS assistant designed for low-latency, context-aware dialogue with phoneme-aware constrained decoding for stronger speech quality.