Audio Technology

Text-to-Speech and Text-to-Sound in 2026: A Working Brand Guide

Synthetic voice has crossed the line from obvious to convincing. Here's where it actually saves time, where it breaks brand trust, and where the disclosure norms have settled.

What's in this article

  1. Where the Tech Actually Stands in 2026
  2. Where Synthesis Saves Time
  3. Where Synthesis Breaks Trust
  4. The Disclosure Norm in 2026
  5. Voice Cloning: Where the Line Is
  6. Beyond Voice: Text-to-Sound
  7. The Working Brand Policy

Where the Tech Actually Stands in 2026

By mid-2026, the best text-to-speech systems (ElevenLabs, OpenAI TTS, Google's Chirp 3, Microsoft's Azure Neural TTS) produce voice that's indistinguishable from recorded human speech in most contexts when run on clean text. The quality conversation is over. The remaining conversations are about appropriateness, disclosure, and where the voice is being deployed.

What's still hard: emotional dynamic range, regional accents outside the major commercial languages, and any application where the listener has heard the source speaker before (the synthetic version sounds slightly off in ways listeners can identify).

Where Synthesis Saves Time

The defensible deployments:

Where Synthesis Breaks Trust

The deployments that consistently produce backlash:

The Disclosure Norm in 2026

The disclosure expectations have settled:

The legal floor (EU AI Act, US state laws) is moving faster than industry self-regulation. Comply with the strictest applicable rule, not the most permissive.

Voice Cloning: Where the Line Is

Voice cloning specifically deserves its own paragraph because the rules are different from generic synthesis. Cloning a voice without explicit consent is now:

The standard for ethical voice cloning is the same as for any photographic likeness: explicit, written, scope-limited consent, with clear disclosure to the eventual audience that synthesis is being used.

Beyond Voice: Text-to-Sound

Text-to-sound (generating sound effects, ambient audio, music from prompts) has matured more slowly than voice but is now production-viable for specific uses:

What it's not ready for: complex music for finished work, signature sound design that needs to match a brand identity, anything where audio quality matters at studio reference levels.

The Working Brand Policy

A defensible internal policy on synthetic voice and sound:

  1. Hand-recorded voiceover for hero brand content. Founder, brand films, customer testimonials.
  2. Synthetic acceptable for: internal content, multilingual versions of recorded source material, accessibility tracks, rapid prototyping.
  3. Disclosure required: any customer-facing synthetic voice, any cloned voice, any audio where the listener might reasonably believe a real person spoke.
  4. Prohibited: cloning without written consent, synthesizing competitors' voices, deepfaking founder or executive voices.

Policies like this hold up under audit and avoid the worst-case reputational scenarios. They also free the team to use synthesis where it genuinely saves time without manufacturing trust risk.

Ready to put a camera on it?

Start Motion Media is a commercial production company for emerging brands — crowdfunding films, DTC product videos, and brand campaigns shipped from San Francisco, New York, Austin, Denver, and San Diego.

Get a Quote   About the Studio