Tag
Production
2 posts
DeepEval: how I measure the quality of my medical agent with objective metrics
How I built an evaluation layer with DeepEval to measure the quality of Shuri, Examya's medical agent. With real data: from 20% to 70% on E2E, custom metrics for Chile's FONASA system, and why gpt-5-nano doesn't work for structured output.
pgvector + Embeddings in Production: The Foundation of Medical Reasoning in Examya
Architecture for semantic search and text similarity in production with pgvector, pg_trgm, and real MINSAL data.