End-to-end UX for a custom LLM training workflow — letting contact center operators evaluate calls to train proprietary AI models.
Invoca's platform analyzes phone calls using AI to surface business intelligence for contact centers. To improve model accuracy, operators needed a way to review call transcripts and provide structured feedback — but no such tooling existed.
I was tasked with designing an end-to-end workflow that let non-technical operators evaluate AI-classified calls, correct labels, and submit training data back to the model pipeline — without requiring any ML knowledge.
The core design tension: make a complex machine learning feedback loop feel like a simple review queue. Operators needed confidence in what they were doing without being exposed to the underlying model mechanics.
The project required close collaboration with ML engineers, product managers, and contact center operators across four enterprise customers.
I started with five contextual inquiry sessions embedded with contact center supervisors at two Invoca customers. This surfaced a critical insight: operators had deep domain intuition about call quality but zero shared vocabulary with the ML team defining model categories.
"I know a bad call when I hear it. But when you ask me to label it, I don't know what bucket you want me to put it in."
— Contact Center Supervisor, Discovery InterviewInitially designed a multi-step wizard that walked operators through each label dimension independently. Testing revealed this fragmented their natural listening flow — they evaluate calls holistically, not attribute-by-attribute.
Redesigned around a single-screen review panel with a persistent audio player. All labels visible at once, with progressive disclosure for edge cases. Operators could listen and label simultaneously.
This project taught me that AI product design is fundamentally translation work — between the model's abstractions and the human's lived experience. The UX isn't just the interface, it's the conceptual scaffolding that lets non-technical users act with confidence inside a machine learning system.
If I were to do it over, I'd push earlier for operator involvement in defining the label taxonomy itself, rather than having ML set it and us adapting the UI to it. Shared ownership of the schema would have reduced the conceptual gap we had to bridge in the interface.