𝘈𝘱𝘱𝘳𝘰𝘹 𝘳𝘦𝘢𝘥 𝘵𝘪𝘮𝘦: 4 𝘮𝘪𝘯𝘴🕒
At VoiceBox, we’ve spent over a decade helping businesses use voice and accessibility technology to enhance video content and reach wider audiences.
That’s why in June 2023, when Meta announced its own Voicebox, we were curious!
To be clear, Voicebox is a research model from Meta AI. It can edit speech, remove noise, perform cross‑lingual style transfer and generate realistic synthetic speech from short samples. It is completely separate from media accessibility and localisation provider VoiceBox.
While Meta’s VoiceBox isn’t a product businesses can use, it provides valuable insight into how modern AI voice systems work — and where their limitations lie.
In this blog, we’ll explore Meta’s VoiceBox and how it’s different to VoiceBox…
Meta’s Voicebox: a breakthrough in AI-generated speech
Watch Meta’s Mark Zuckerberg Reveal New AI Tool, VoiceBox
Introducing Meta’s groundbreaking creation: Voicebox, a versatile and state-of-the-art AI model.
Voicebox can perform a myriad of tasks – such as editing, sampling, and stylising – through in-context learning, even when it hasn’t been specifically trained to do so.
- Cannot be licensed
- Cannot be used by businesses
- Cannot be accessed by creators
- Has no commercial roadmap
This is because risks around voice cloning and impersonation are too high and Meta prefers publishing papers and demos only.
Voicebox’s versatility enables it to perform multiple tasks, including:
- Speech editing and noise reduction: Voicebox possesses the ability to recreate portions of speech that were interrupted by noise or to replace mispronounced words without needing to re-record the entire speech.
- Cross-lingual style transfer: Voicebox can produce a text reading in any of the six supported languages (English, French, German, Spanish, Polish, or Portuguese) while using a sample of someone’s speech in a different language. This capacity could revolutionise communication in the future.
- Diverse speech sampling: Voicebox can generate speech more representative of how people talk in the real world and in the six aforementioned languages, thanks to learning from diverse data.
Meta’s Voicebox is undoubtedly a significant advancement in the realm of generative AI research, and we’re curious about the future developments in this audio space as researchers build upon this innovative work.
Don’t get us confused
One of the key reasons we thought it would be useful to make this blog is so that readers don’t get confused.
While we carry the same name, the capital ‘B’ in VoiceBox should always be a giveaway! And the fact that one is a research model and the other (we) provides media accessibility and localisation services, including voiceovers, subtitles, live captions and more.
AI voiceovers: pros and cons
Now, given that at VoiceBox we provide AI voiceovers, and Meta’s Voicebox is a research blueprint that explains how modern AI voiceovers work and where their limits come from, it’s wise to discuss AI voiceovers.
Pros:
- Budget-friendly
- Fast to produce
- Cheaper to scale
- Only getting better
Cons:
- Not as trustworthy or professional as human voiceovers
- Can struggle with emotion and connecting with audiences
- May require disclosure or additional safeguards under regulations such as the EU AI Act, particularly for public‑facing content, which could lead to lack of trust
That being said, we believe both have a place. And we will always be honest about where we feel both are best suited.
When it’s external, high-impact brand or marketing content, human voiceover is far superior right now.
AI voiceovers would be best for internal content, such as training. Additionally, they suit bulk content and scaling video production for things like social media explainer videos.
Why choose VoiceBox?
The answer lies in the perfect blend of experience, personalisation, accessibility and professionalism on offer.
We’ve been around since 2014 now, supplying thousands and thousands of media projects. This wealth of experience means we understand the nuances, subtle varieties, and intricate details that make all the difference in delivering an authentic message.
We will always be honest with you as to which service suits you best, and whether a human, AI or hybrid workflow is most applicable.
We understand each company marches to its own beat. Therefore, our services are fully customisable, flexible and moldable to your specific brand voice and messaging.
Remember, the choice of AI voiceover isn’t about deciding the ‘better’ between two options but choosing the ‘right’ one. The one that understands and captures your brand’s voice best.
Contact us today and tune into the sound of success today!
FAQs
AI voiceovers can be compliant, but they may require additional safeguards depending on how and where they are used. Public‑facing content may need transparency, disclosure, or consent considerations under regulations such as the EU AI Act.
Working with an experienced provider helps ensure voiceover workflows meet accessibility standards and regulatory expectations without compromising quality or trust.
At VoiceBox, we take an honest, consultative approach, helping businesses evaluate each project individually. Rather than pushing a single solution, we assess your content goals, audience, accessibility needs, timelines, and regulatory considerations to recommend the most suitable option — whether that’s human voiceover, AI voiceover, or a blended approach designed specifically for your use case.
Clear pronunciation, natural pacing, emotional clarity, and consistency all impact comprehension and accessibility, especially for audiences with hearing or visual impairments or cognitive accessibility needs. Human voiceovers typically offer stronger accessibility outcomes, though AI voices can be appropriate when implemented carefully and reviewed by accessibility specialists.
