Meetups

Inside Media Tech: Data Architecture, AI, and Cost Strategies in Scaling Video

Inside Media Tech: Data Architecture, AI, and Cost Strategies in Scaling Video

Summary

"In today’s video industry, there’s no shortage of data. But what emerged during the October edition of Mate Talk is that the real challenge isn’t collecting it, it’s deciding what matters, and when.
Moderated by JP Saibene from Qualabs, this session featured engineers and leaders from YouTube, SVTA, Warner Bros Discovery, Dolby and Paramount, unpacking the real problems of data architecture, AI implementation, ad standardization, and playback analytics at scale."

The Hidden Cost of “Real-Time Everything”

One of the strongest threads of the session focused on the assumption that everything needs to be real-time. Connie emphasized how media teams constantly face trade-offs between data speed, depth, and storage duration, especially under the weight of contractual obligations for content takedown or fraud detection.

Sean challenged the default mindset: “Do you really need it real-time? Or are you assuming you do?”

Sean argued that only a subset of use cases truly require live data, such as stream debugging or live ad monitoring, and that pushing everything into real-time adds massive cost and operational complexity.

Bhavesh built on this with a hard operational question: Where should we spend our money — compute for immediate action, or depth of data for later analysis?

The conversation turned into a pragmatic framework:

  • If the use case is compliance, fraud, or live user impact → real-time may be justified.
  • For BI, reporting, or long-term trends → batch or standard logs are more efficient.

The Video Player as the Data Control Center

One of the most valuable insights came from Casey, who dove deep into the role of the video player in data ecosystems:

  • The player isn’t just a UX layer, it’s the primary point of truth for playback telemetry.
  • Every code change in the player affects downstream analytics. Even small timing adjustments in events like seek or play can throw off QoE signals and trigger false alarms in monitoring systems.

“If I change when a player event fires, I get calls like: ‘latency went up, what broke?’ The player is the start of everything.”

Casey explained how his team runs side-by-side data runs with different playback engines, using precise player metrics to validate which engine actually performs better — not just by feel, but with hard evidence.

“Data is the only way I can justify engine decisions. It’s the only language everyone agrees on.”

This point reinforced a core truth: good video data starts at the player, not the CDN, not the backend.

AI at Scale: expensive, complex, and not always worth it

The panel didn’t glorify AI. Instead, they examined its realities. Sean noted that ML infrastructure is often underestimated, compute costs, latency, versioning, and model explainability all come into play. Bhavesh proposed a better path forward: building an industry-specific model trained on streaming data (player logs, ad events, rebuffering patterns) — instead of trying to repurpose generic LLMs.

The key insight: AI must be focused, contextual, and measurable. Otherwise, it becomes a distraction.

Alex reinforced this with a clear principle: “You have to know the question you’re answering with AI. Otherwise, it’s just a science project.”

Ad Monitoring, Creative IDs, and Fragmentation Headaches

The session also dove into ad analyticsone of the most fragmented parts of video workflows.

Sarge from Disney explained how real-time ad data is critical during large-scale events to catch errors that could degrade the stream and kill monetization. But this real-time requirement introduces its own risk:

"We requested CMCD in standard logs because real-time was too costly. For live QoE, we need to control what gets ingested and when, not everything has to be real-time."

  • Ad servers aren’t always in sync with playback systems.
  • CDNs may lag in delivering error feedback.
  • Creative identifiers often don’t survive the delivery chain.

Casey described how even with player manifest support, systems like MUX or Conviva won’t see the right data unless the integration is tight.

The takeaway? Even for basic visibility into ad performance, you’re building systems that weren’t designed to talk to each other.

Data Management at Scale: Normalization, Sessionization, and Post-Mortems

The team also explored the behind-the-scenes plumbing of large-scale media telemetry:

  • Session ID tracking across CDN switches is still a huge challenge.
  • Devices like Roku create phantom sessions, breaking continuity.
  • Post-mortem analysis often requires stitching together player, CMCD, and CDN data, which are rarely aligned.

Bhavesh brought up that even CMCD, which started with just 13 keys, now has 44, and yet platform interoperability remains weak.

David emphasized how mergers and acquisitions create chaos for data: new brands, new platforms, legacy schemas — all of which need to be reconciled.

Contracts, Security & Geo-Restrictions

A critical part of the discussion centered on legal and contractual pressures that shape data requirements:

  • For some live events, teams are contractually required to take down streams within minutes if piracy or fraud is detected.
  • Connie highlighted that these triggers require coordination across CDNs, origin servers, and playback systems, and geo-restrictions are now part of every serious rights deal.

The challenge isn’t just technical, it’s organizational: who owns the response? who sees the data first? who acts on it?

🧉 Final Reflection: Choosing Clarity Over Complexity

What made this edition of Mate Talk stand out wasn’t just the breadth of topics, it was the raw, practical honesty from the people who are in the trenches, building and maintaining the systems we all depend on.

This wasn’t a session of hypotheticals or high-level strategy. It was about the real trade-offs faced every day:

  • Between acting on data in real-time or optimizing for scale.
  • Between investing in visibility or controlling operational costs.
  • Between pushing innovation or navigating legacy contracts that weren’t designed for today's workflows.

It reminded us that modern media infrastructure isn’t just about tools, it’s about making smarter, more intentional choices in the face of constraints.

“We don’t need more dashboards. We need more decisions.”  — JP Saibene

That single line captured the essence of the session, and likely, the road ahead for everyone in this industry.

Subscribe and be part of the Qualabs’ community!

A newsletter delivering cutting-edge tech updates, industry innovations and unique experiences from Qualabs' perspective!

Stay up to date on the latest trends and stories shaping video tech.