01 — Featured Case Study
AI Product LLM Enterprise

Invoca AI Model Training Interface

End-to-end UX for a custom LLM training workflow — letting contact center operators evaluate calls to train proprietary AI models.

Company Invoca
Role Senior UX Designer
Year 2024
Platform Web App · Enterprise
[ Hero product screenshot ]
Overview

The challenge

Invoca's platform analyzes phone calls using AI to surface business intelligence for contact centers. To improve model accuracy, operators needed a way to review call transcripts and provide structured feedback — but no such tooling existed.

I was tasked with designing an end-to-end workflow that let non-technical operators evaluate AI-classified calls, correct labels, and submit training data back to the model pipeline — without requiring any ML knowledge.

The core design tension: make a complex machine learning feedback loop feel like a simple review queue. Operators needed confidence in what they were doing without being exposed to the underlying model mechanics.

The project required close collaboration with ML engineers, product managers, and contact center operators across four enterprise customers.

Outcomes

What we achieved

Faster model retraining cycle through structured operator feedback
91%
Task completion rate in usability testing with contact center operators
0
Training errors — operators submitting correctly labeled data from day one
Discovery

Understanding the problem space

I started with five contextual inquiry sessions embedded with contact center supervisors at two Invoca customers. This surfaced a critical insight: operators had deep domain intuition about call quality but zero shared vocabulary with the ML team defining model categories.

"I know a bad call when I hear it. But when you ask me to label it, I don't know what bucket you want me to put it in."

— Contact Center Supervisor, Discovery Interview
[ Affinity diagram / research synthesis ]
[ Journey map ]
Process

How I designed it

01
Vocabulary alignment workshop
Facilitated a two-hour card-sorting session with ML engineers and three operators to build a shared taxonomy. This became the label schema in the UI.
02
Lo-fi exploration
Sketched 12 different review flow patterns — from form-based to conversational — and pressure-tested each against the operator mental model.
03
Prototype + usability rounds
Built a Figma prototype and ran three rounds of moderated usability testing. Each round tightened the label hierarchy and reduced cognitive load.
04
Phased delivery
Worked with engineering across four sprints. Defined component specs, edge cases, and empty states. Joined sprint reviews to QA against designs.
[ Key screens / prototype walkthrough ]
Design decisions

What I got wrong first

First approach

Initially designed a multi-step wizard that walked operators through each label dimension independently. Testing revealed this fragmented their natural listening flow — they evaluate calls holistically, not attribute-by-attribute.

What worked

Redesigned around a single-screen review panel with a persistent audio player. All labels visible at once, with progressive disclosure for edge cases. Operators could listen and label simultaneously.

Reflection

What I learned

This project taught me that AI product design is fundamentally translation work — between the model's abstractions and the human's lived experience. The UX isn't just the interface, it's the conceptual scaffolding that lets non-technical users act with confidence inside a machine learning system.

If I were to do it over, I'd push earlier for operator involvement in defining the label taxonomy itself, rather than having ML set it and us adapting the UI to it. Shared ownership of the schema would have reduced the conceptual gap we had to bridge in the interface.

← Back to all work Next case study Contact Center Workflow Redesign →