Invoca AI Signal Discovery

Impact

4

Internal SME interviews that shaped the concept

4

Workflow stages designed end-to-end

8hrs

Of white-glove service Invoca bundled with every new AI customer to compensate for workflow gaps

13

Rounds one customer trained a single signal without acceptable accuracy

Project

Company Invoca (B2B SaaS, conversation intelligence)

Role Senior UX Designer

Timeline 2025 · ~4 months (concept phase)

Platform Web App · Enterprise SaaS

Led Research effort, problem framing, end-to-end concept design across 4 workflow stages, interactive prototype

Team Solo design lead partnered with PM Zach Cohen and 4 analytics services SMEs as research participants

The Problem

The bookends problem

Invoca's AI Studio let customers train custom LLM models on their call data, building signals that could automatically detect outcomes specific to their business: appointments booked, sales converted, frustrations expressed. The feature shipped in early 2025. Accuracy was high. Customers who got it working loved it.

But getting it working was the problem.

The v1 workflow dropped users straight into manual training: review calls one at a time, mark them as true or false examples. Each round required evaluating 20 true and 20 false calls. It was meticulous, manual, and demanding. And it assumed users already knew exactly what they wanted to find. Many of them thought they did. Then they started reviewing calls and realized their understanding shifted with every edge case they hit. The workflow was simultaneously a training tool and a discovery tool, and it had only been designed to be the first.

My Role

Conceiving the research, naming the problem

What I owned

Conceived and led the research effort: designed protocol, conducted 4 SME interviews, synthesized findings into the problem framework
Identified the structural framing (the "bookends problem") that connected disconnected symptoms into a coherent product strategy
Designed the end-to-end concept across all 4 workflow stages: Topic Explorer, Signal Blueprint, LLM-as-a-Judge Training, Automated Validation
Built the interactive Figma Make prototype as the primary stakeholder communication artifact

What I partnered on

PM Zach Cohen on product strategy, phasing, and technical feasibility
Analytics services team (Sofia Contreras, Radhika Sharma, Brian Coulombe, Mitch Williams) as research participants and ongoing sounding boards

Research + Insight

Finding the human layer underneath

The bookends problem: discovery and validation both broken, training in the middle worked

I worked with PM Zach Cohen to scope a research effort. We interviewed four Invoca team members who spent their days guiding customers through AI signal creation. They lived inside the workflow's gaps every day, building signals on behalf of customers, interpreting ambiguous definitions, coaching people through a tool that was supposed to be self-service.

The pattern was immediate and unanimous. Invoca had quietly built an entire human infrastructure around the workflow's weaknesses. Every new AI Gold customer received 8 hours of dedicated time from analytics services. For larger enterprise accounts like T-Mobile, the arrangement was contracted hours yearly with a dedicated analyst handling all signal creation. The product worked, but only with a person standing next to it explaining what to do.

Sofia Contreras · Sr. Analytics Services Mgr

Pre-round-zero scaffolding

Mapped every call scenario for telco clients before training ever started. None of that lived in the product.

Mitch Williams · Sr. Business Value Lead

Insights first, signals second

Customers shouldn't have to build a signal to discover whether it's worth building. Flip the order.

Brian Coulombe · Analytics Services

Definition, not labeling

Wanted to describe the signal in his own words and have the system understand intent, not hunt for proof.

Radhika Sharma · Analytics Manager

Round-zero lock-in

"After round zero, you're sort of locked in to what you've fed it." Bad early scoping became unrecoverable.

The reframe: the workflow had two distinct failure points, at opposite ends.

The front door was missing. What made a good use case for an AI signal? What didn't? The workflow had no answer. One customer trained for 13 rounds and still couldn't get acceptable accuracy. When an analyst stepped in, scoped the signal properly, and reached 90 to 95% accuracy by round four. The difference wasn't the technology. It was knowing what to aim at.

The back door was duct-taped shut. After launching a signal, the product gave an "estimated accuracy" score but couldn't tell users how it actually performed in production. Real-world accuracy could swing 5 to 10% from the training prediction. Analytics team members ran accuracy tests by hand, pulling random call samples, listening to them, recording results in spreadsheets.

"Flipping that on its head: providing insights first that inform what signals to build. I think would be kind of cool."

Mitch Williams, Senior Business Value Lead, Invoca

"I wish I could just tell it my definition. I wish it could see my definition to just get it."

Brian Coulombe, Analytics Services, Invoca

The middle worked. Without guided entry and automated validation, the whole system depended on human scaffolding at both ends.

The Work

Stage 1: Topic Explorer

Instead of dropping users into call review, surface topics automatically from their call data. Cards showing themes the system has already identified, with references to specific call snippets as evidence. Two buckets: anticipated use cases that every customer in a vertical would likely need, and unknown unknowns, anomalies and patterns the customer wouldn't have thought to look for. Don't make me hunt. Show me what's there.

Stage 2: Signal Blueprint

Once a user picks a topic, structured guidance helps them define what the signal should find. What to include. What to exclude. Critically, the page recommends some use cases as good candidates for AI signals and flags others that aren't. AI signals were weaker at certain things, empathy and emotion-based detection in particular, and the v1 workflow never told users that. They'd train for weeks and discover the limitation the hard way. This step moved the analytics team's pre-round-zero process into the product.

Stage 3: LLM-as-a-Judge Training

Manual training stayed as an option. The new path: a second LLM evaluates calls alongside the custom model. Where they agree, no human input needed. Where they disagree, the user becomes the tiebreaker. Still in the loop, but only at decision points that actually require judgment.

This was the design problem that made the project hard. LLM outputs are probabilistic, not deterministic. The SMEs evaluating them aren't engineers, they're marketing analysts and QA managers. Evaluation criteria are inherently subjective. The feedback loop between training and results is opaque, measured in hours not milliseconds. Designing an interface that makes this legible to non-technical users without oversimplifying the underlying complexity is the core challenge of AI product design. There's no design system for it. The frameworks have to be invented for each context.

Stage 4: Automated Validation

Productize what the analytics team had been doing manually with spreadsheets: automated accuracy testing on live call data, with feedback mechanisms for different outcomes. The validation step needed to be honest. A product that only tells you the good news isn't building trust. The feedback acknowledges limitations directly, including recommending some use cases switch to keyword-based signals or other detection methods when AI isn't the right tool.

Interactive Prototype

The four-stage workflow, walkable

The Figma Make prototype is the closest thing this concept has to a shipped artifact. It walks through Topic Explorer, Signal Blueprint, training, and validation as a clickable flow. It's the artifact I used to align stakeholders on the vision and earn buy-in for the next phase.

Launch interactive prototype ↗

Outcomes

What was aligned, what didn't ship

The concept validated cleanly. Research surfaced the "bookends problem" framing that became the product strategy lens. Stakeholders aligned on the four-stage approach. The interactive Figma Make prototype told the story across multiple stakeholder presentations and earned buy-in for the next phase.

I left Invoca before the concept moved into build. The work mattered beyond what shipped. Research exposed a structural problem (that the product's self-service promise was being subsidized by an invisible human layer) and the concept addressed it at the system level, not with patches. The bookends framing carried forward as a shared lens the analytics services team and PMs continued using when scoping signal work for new customers.

Reflection

The hard kind of project to put in a portfolio

This is the hardest kind of project to put in a portfolio. The one that was right but didn't finish. If I were picking it up tomorrow, I'd push harder on two things. First, the validation step needed user research with external customers. Internal SMEs are real users but bring expertise that masks usability problems. The concept needs to work for the customer training their first signal without an analyst guiding them through it. Second, the LLM-as-a-Judge interface and validation feedback system were both still in active iteration. Those are the moments where design either builds trust or erodes it.

Teaching Machines to Listen

The bookends problem

Conceiving the research, naming the problem

Finding the human layer underneath

Stage 1: Topic Explorer

Stage 2: Signal Blueprint

Stage 3: LLM-as-a-Judge Training

Stage 4: Automated Validation

The four-stage workflow, walkable

What was aligned, what didn't ship

The hard kind of project to put in a portfolio

Let's
talk.

Teaching Machines to Listen

The bookends problem

Conceiving the research, naming the problem

Finding the human layer underneath

Stage 1: Topic Explorer

Stage 2: Signal Blueprint

Stage 3: LLM-as-a-Judge Training

Stage 4: Automated Validation

The four-stage workflow, walkable

What was aligned, what didn't ship

The hard kind of project to put in a portfolio

Let'stalk.

Let's
talk.