Meetups

Latency vs. AI: The Real Trade-Off in Live Streaming

Latency vs. AI: The Real Trade-Off in Live Streaming

Summary

"The streaming industry spent a decade asking 'how low can we go?' With AI captions, multilingual delivery, and real-time features now in the picture, a more important question has taken its place: what are we actually buying with those seconds?"

The most recent Mate Talk hosted by Qualabs brought together six media technology leaders to debate something the industry keeps getting wrong: the relationship between latency, architecture, and the features that viewers actually notice. The conversation covered streaming-first design, the real cost of low-latency obsession, and what HBO Max learned building for March Madness 2024.

Latency in live streaming is the delay between when a live event happens and when viewers see it on their screen. Most modern platforms target 15–30 seconds. Lower targets require significant architectural trade-offs. Streaming-first architecture is a design approach that consolidates production, encoding, and distribution into a single pipeline from the ground up—rather than adapting legacy broadcast workflows."

It's Not About Codecs, It's About Architecture

When Neil Roberts explained WBD's approach, he cut through the noise immediately.When Neil Roberts explained WBD's approach, he cut through the noise immediately.

"At WBD, we're not focused on low-latency codecs. What we are focused on is reducing the latency and improving the quality by making our live workflows streaming first." - Neal Roberts

Four years ago, when HBO Max committed to becoming a streaming-native platform, the team made a deliberate choice: don't inherit broadcast workflows. 

The reasoning: "If we design our workflows in a legacy manner, we are painting ourselves into a corner that's nearly impossible to get back out of."

The traditional broadcast chain moves content through multiple sequential handoffs—production truck, production control room, master control room, distribution partners. Each step adds latency, complexity, and a new failure point. WBD's approach was to collapse that chain into a single streaming-native pipeline, with captioning, ad breaks, audio descriptions, and metadata all handled inside the digital transcode workflow—no intermediate steps, no inherited legacy complexity.

The March Madness 2024 result

When HBO Max launched this architecture for March Madness 2024, the outcome was concrete:

"We saw 17 seconds glass to glass. No low latency codecs. That included native HDR, native 1080p60, our own AI captioner." 17 seconds, with everything included. No specialized codecs. No vendor lock-in. The number was a byproduct of better design, not the target.

Why the Latency Race Is Mostly a Marketing Problem

Andy Beach put the current dynamic in sharp terms:

"We've moved into the hot rod era of latency, where it's like, sure, I could 100% get this down to microseconds if you want, it'll just cost you an extra $300,000 or $400,000—and it's like, well, now is the juice worth the squeeze? Probably not." 

The obsession persists because executives treat latency as a brand signal: lower number = better platform. Marketing teams follow. But what viewers actually experience is something different. When Neil monitors social media during and after HBO Max live events, the feedback is consistently about picture quality, clarity, and reliability. Latency complaints are rare. What surfaces regularly is praise.

David Hassoun captured the disconnect from the engineering side: "My fear is that the execs are holding on to those buzzwords...but I'm giving you 20 languages, zero rebuffering, real-time captions, sign language video. Do they actually understand the value of that?"

The gap: executives optimize for a metric that viewers don't notice, while underselling capabilities that viewers directly experience.

The Framework: What are webuying with our Latency?

Sean McCarthy reframed the evaluation criteria entirely. The question isn't "how low?" It's: what are we buying with our latency?

The practical test:

  • Good trade-off: Add 10 seconds, get AI captions, lower rebuffer rates, and better video quality in return.
  • Bad trade-off: Add a minute because content needs to transcode three times through legacy regional infrastructure.

Every second of latency should be earning something. If it isn't, it's a symptom of architectural debt, not a feature. The panel's conclusion: design the workflow correctly from the start, and latency becomes a side effect of getting everything else right—not a target to optimize in isolation. Reduce the unnecessary hops, and latency, quality, and reliability tend to improve together.


A big thank you to Neil Roberts for joining as special guest, and to all the panelists who made this session possible: David Hassoun, Olga Kornienko, Bhavesh Upadhyaya, Sean McCarthy, and Andy Beach. Hosted by JP Saibene.

Mate Talk is a bi-monthly meetup hosted by Qualabs for the video and media tech community. Each session brings together engineers, product leaders, and content creators to debate what's actually happening in the industry—from emerging streaming standards like MoQ (Media over QUIC) to real-world streaming architecture challenges.

🔗 Full video replay available on our YouTube! 

Subscribe and be part of the Qualabs’ community!

A newsletter delivering cutting-edge tech updates, industry innovations and unique experiences from Qualabs' perspective!

Stay up to date on the latest trends and stories shaping video tech.