I'm the ElevenLabs CEO - what do you want to do with voice AI but can't? (AMA)
by•
Hi Everyone!
Solving AI audio end-to-end means tackling both generation and understanding - from text-to-speech to speech-to-text and everything in between. At ElevenLabs, we’re working on breakthroughs in AI audio that bridge research and real-world use.
Ask me anything about what we’re building, the challenges of scaling AI speech models, and where this space is headed. Also keen to hear what you’ve built with ElevenLabs!
2.6K views



Replies
Product Hunt
I imagine ElevenLabs has a ton of control and security measures around voice cloning, but with open source models popping up every day, do you think we'll see a proliferation of audio deep fakes? Are we already there? Is there a way to fight this or is this just something the future will hold regardless of what we do?
It supports multiple languages such as ASR and TTS, and the effect is also good. However, there are some minor effect issues during the usage process. For instance, ASR sometimes recognizes some baffling texts, and the sound color cloning may sometimes have a slightly Japanese and East Asian accent. Hope there will be some improvement in the future.
TestAI
Current AI voices sound too perfect, How can we get more human like audio with unusual stops and mistakes?
Mudrex
I want to be able to make tutorial videos for my SaaS products with my own voice. Descript does a good job transcribing my videos but I also want it to be able to replace my voice with a better version of my own voice; with no "umms" and "aahs" or freudian slips.
@matistski - Voice AI has so much potential, but the big limitation right now is real-time, emotionally adaptive responses. AI voices sound great in controlled settings, but the second you need them to react dynamically in a conversation, they fall apart. Any plans to tackle that?
Coval
Current voice AI systems rely on cascaded models (STT → LLM → TTS) or end-to-end speech-to-speech models, each with trade-offs in latency, control, and expressiveness. What do you see as the biggest technical bottleneck in moving towards a truly real-time, low-latency, emotionally adaptive speech model? Is it model architecture, dataset limitations, compute constraints, or something else entirely? What will the control plane for these end to end models look like?
DeepBrain AI
Hi @matistski , I’m interested in an AI that can dub songs! For instance, using voice cloning technology in music could open up a lot of business! This could even have a big impact on the music industry! Kpop or jpop!
Headliner
no questions/insights - just here to say we love ElevenLabs!!
Hey Mati! I'm building an app that requires an elderly voice, but I notice there are not very many elderly voice options on your platform. How would you go about making a voice sound elderly? The flow I'm going to test is create a voice clone, create a custom voice via prompt (elderly), then use voice changer to edit the voice clone using the custom elderly voice. Seems like there should be a simpler way.. what do you think?