Mati Staniszewski

I'm the ElevenLabs CEO - what do you want to do with voice AI but can't? (AMA)

by

Hi Everyone!

Solving AI audio end-to-end means tackling both generation and understanding - from text-to-speech to speech-to-text and everything in between. At ElevenLabs, we’re working on breakthroughs in AI audio that bridge research and real-world use.

Ask me anything about what we’re building, the challenges of scaling AI speech models, and where this space is headed. Also keen to hear what you’ve built with ElevenLabs! 

2.6K views

Add a comment

Replies

Best
steve beyatte

I imagine ElevenLabs has a ton of control and security measures around voice cloning, but with open source models popping up every day, do you think we'll see a proliferation of audio deep fakes? Are we already there? Is there a way to fight this or is this just something the future will hold regardless of what we do?

Yu Pan

It supports multiple languages such as ASR and TTS, and the effect is also good. However, there are some minor effect issues during the usage process. For instance, ASR sometimes recognizes some baffling texts, and the sound color cloning may sometimes have a slightly Japanese and East Asian accent. Hope there will be some improvement in the future.

Hasan Abusheikh
voice models perform extremely good in english eleven labs models are doing better than most in it as well for the reason of dataset quality availability in languages like main english but how will you address other languages where good or high quality datasets might not exist or be available publically like Arabic, Hindu, Urdu etc… which will impact the models quality, how do you see personally see the scalability of such models in those areas?

Current AI voices sound too perfect, How can we get more human like audio with unusual stops and mistakes?

Divyanth Jayaraj

I want to be able to make tutorial videos for my SaaS products with my own voice. Descript does a good job transcribing my videos but I also want it to be able to replace my voice with a better version of my own voice; with no "umms" and "aahs" or freudian slips.

Brad Harris

@matistski - Voice AI has so much potential, but the big limitation right now is real-time, emotionally adaptive responses. AI voices sound great in controlled settings, but the second you need them to react dynamically in a conversation, they fall apart. Any plans to tackle that?

Brooke Hopkins

Current voice AI systems rely on cascaded models (STT → LLM → TTS) or end-to-end speech-to-speech models, each with trade-offs in latency, control, and expressiveness. What do you see as the biggest technical bottleneck in moving towards a truly real-time, low-latency, emotionally adaptive speech model? Is it model architecture, dataset limitations, compute constraints, or something else entirely? What will the control plane for these end to end models look like?

Won Park

Hi @matistski , I’m interested in an AI that can dub songs! For instance, using voice cloning technology in music could open up a lot of business! This could even have a big impact on the music industry! Kpop or jpop!

Elissa Craig

no questions/insights - just here to say we love ElevenLabs!!

Beau Sterling

Hey Mati! I'm building an app that requires an elderly voice, but I notice there are not very many elderly voice options on your platform. How would you go about making a voice sound elderly? The flow I'm going to test is create a voice clone, create a custom voice via prompt (elderly), then use voice changer to edit the voice clone using the custom elderly voice. Seems like there should be a simpler way.. what do you think?