All work
Case Study — Invoca AI Signal Discovery  ·  6 min read
Enterprise B2B AI Product Design Concept

Teaching Machines to Listen

How I led research and concept design for an AI workflow redesign that reframed Invoca's signal training as a guided exploration rather than a manual labeling task.

4
Internal SME interviews that shaped the concept
4
Workflow stages designed end-to-end
8hrs
Of white-glove service Invoca bundled with every new AI customer to compensate for workflow gaps
13
Rounds one customer trained a single signal without acceptable accuracy
Company Invoca (B2B SaaS, conversation intelligence)
Role Senior UX Designer
Timeline 2025 · ~4 months (concept phase)
Platform Web App · Enterprise SaaS
Led Research effort, problem framing, end-to-end concept design across 4 workflow stages, interactive prototype
Team Solo design lead partnered with PM Zach Cohen and 4 analytics services SMEs as research participants

The bookends problem

Invoca's AI Studio let customers train custom LLM models on their call data, building signals that could automatically detect outcomes specific to their business: appointments booked, sales converted, frustrations expressed. The feature shipped in early 2025. Accuracy was high. Customers who got it working loved it.

But getting it working was the problem.

The v1 workflow dropped users straight into manual training: review calls one at a time, mark them as true or false examples. Each round required evaluating 20 true and 20 false calls. It was meticulous, manual, and demanding. And it assumed users already knew exactly what they wanted to find. Many of them thought they did. Then they started reviewing calls and realized their understanding shifted with every edge case they hit. The workflow was simultaneously a training tool and a discovery tool, and it had only been designed to be the first.

Conceiving the research, naming the problem

What I owned
  • Conceived and led the research effort: designed protocol, conducted 4 SME interviews, synthesized findings into the problem framework
  • Identified the structural framing (the "bookends problem") that connected disconnected symptoms into a coherent product strategy
  • Designed the end-to-end concept across all 4 workflow stages: Topic Explorer, Signal Blueprint, LLM-as-a-Judge Training, Automated Validation
  • Built the interactive Figma Make prototype as the primary stakeholder communication artifact
What I partnered on
  • PM Zach Cohen on product strategy, phasing, and technical feasibility
  • Analytics services team (Sofia Contreras, Radhika Sharma, Brian Coulombe, Mitch Williams) as research participants and ongoing sounding boards

Finding the human layer underneath

FRONT DOOR MISSING Discovery no guided entry WORKS Training manual true/false review BACK DOOR DUCT-TAPED Validation no live accuracy check Two failures at opposite ends. The middle worked.
The bookends problem: discovery and validation both broken, training in the middle worked

I worked with PM Zach Cohen to scope a research effort. We interviewed four Invoca team members who spent their days guiding customers through AI signal creation. They lived inside the workflow's gaps every day, building signals on behalf of customers, interpreting ambiguous definitions, coaching people through a tool that was supposed to be self-service.

The pattern was immediate and unanimous. Invoca had quietly built an entire human infrastructure around the workflow's weaknesses. Every new AI Gold customer received 8 hours of dedicated time from analytics services. For larger enterprise accounts like T-Mobile, the arrangement was contracted hours yearly with a dedicated analyst handling all signal creation. The product worked, but only with a person standing next to it explaining what to do.

Sofia Contreras · Sr. Analytics Services Mgr
Pre-round-zero scaffolding
Mapped every call scenario for telco clients before training ever started. None of that lived in the product.
Mitch Williams · Sr. Business Value Lead
Insights first, signals second
Customers shouldn't have to build a signal to discover whether it's worth building. Flip the order.
Brian Coulombe · Analytics Services
Definition, not labeling
Wanted to describe the signal in his own words and have the system understand intent, not hunt for proof.
Radhika Sharma · Analytics Manager
Round-zero lock-in
"After round zero, you're sort of locked in to what you've fed it." Bad early scoping became unrecoverable.

The reframe: the workflow had two distinct failure points, at opposite ends.

The front door was missing. What made a good use case for an AI signal? What didn't? The workflow had no answer. One customer trained for 13 rounds and still couldn't get acceptable accuracy. When an analyst stepped in, scoped the signal properly, and reached 90 to 95% accuracy by round four. The difference wasn't the technology. It was knowing what to aim at.

The back door was duct-taped shut. After launching a signal, the product gave an "estimated accuracy" score but couldn't tell users how it actually performed in production. Real-world accuracy could swing 5 to 10% from the training prediction. Analytics team members ran accuracy tests by hand, pulling random call samples, listening to them, recording results in spreadsheets.

"Flipping that on its head: providing insights first that inform what signals to build. I think would be kind of cool."
Mitch Williams, Senior Business Value Lead, Invoca
"I wish I could just tell it my definition. I wish it could see my definition to just get it."
Brian Coulombe, Analytics Services, Invoca

The middle worked. Without guided entry and automated validation, the whole system depended on human scaffolding at both ends.


Stage 1: Topic Explorer

Instead of dropping users into call review, surface topics automatically from their call data. Cards showing themes the system has already identified, with references to specific call snippets as evidence. Two buckets: anticipated use cases that every customer in a vertical would likely need, and unknown unknowns, anomalies and patterns the customer wouldn't have thought to look for. Don't make me hunt. Show me what's there.

Stage 2: Signal Blueprint

Once a user picks a topic, structured guidance helps them define what the signal should find. What to include. What to exclude. Critically, the page recommends some use cases as good candidates for AI signals and flags others that aren't. AI signals were weaker at certain things, empathy and emotion-based detection in particular, and the v1 workflow never told users that. They'd train for weeks and discover the limitation the hard way. This step moved the analytics team's pre-round-zero process into the product.

Stage 3: LLM-as-a-Judge Training

Manual training stayed as an option. The new path: a second LLM evaluates calls alongside the custom model. Where they agree, no human input needed. Where they disagree, the user becomes the tiebreaker. Still in the loop, but only at decision points that actually require judgment.

This was the design problem that made the project hard. LLM outputs are probabilistic, not deterministic. The SMEs evaluating them aren't engineers, they're marketing analysts and QA managers. Evaluation criteria are inherently subjective. The feedback loop between training and results is opaque, measured in hours not milliseconds. Designing an interface that makes this legible to non-technical users without oversimplifying the underlying complexity is the core challenge of AI product design. There's no design system for it. The frameworks have to be invented for each context.

Stage 4: Automated Validation

Productize what the analytics team had been doing manually with spreadsheets: automated accuracy testing on live call data, with feedback mechanisms for different outcomes. The validation step needed to be honest. A product that only tells you the good news isn't building trust. The feedback acknowledges limitations directly, including recommending some use cases switch to keyword-based signals or other detection methods when AI isn't the right tool.

The four-stage workflow, walkable

The Figma Make prototype is the closest thing this concept has to a shipped artifact. It walks through Topic Explorer, Signal Blueprint, training, and validation as a clickable flow. It's the artifact I used to align stakeholders on the vision and earn buy-in for the next phase.

Launch interactive prototype ↗

What was aligned, what didn't ship

The concept validated cleanly. Research surfaced the "bookends problem" framing that became the product strategy lens. Stakeholders aligned on the four-stage approach. The interactive Figma Make prototype told the story across multiple stakeholder presentations and earned buy-in for the next phase.

I left Invoca before the concept moved into build. The work mattered beyond what shipped. Research exposed a structural problem (that the product's self-service promise was being subsidized by an invisible human layer) and the concept addressed it at the system level, not with patches. The bookends framing carried forward as a shared lens the analytics services team and PMs continued using when scoping signal work for new customers.


The hard kind of project to put in a portfolio

This is the hardest kind of project to put in a portfolio. The one that was right but didn't finish. If I were picking it up tomorrow, I'd push harder on two things. First, the validation step needed user research with external customers. Internal SMEs are real users but bring expertise that masks usability problems. The concept needs to work for the customer training their first signal without an analyst guiding them through it. Second, the LLM-as-a-Judge interface and validation feedback system were both still in active iteration. Those are the moments where design either builds trust or erodes it.

Details have been generalized and some visuals modified to respect confidentiality agreements.

← Back to all work
Invoca Call Review Console preview

Let's
talk.

Open to Senior or Staff UX design roles in enterprise, AI, or consumer. Available now.