I have just released an AI-narrated version of my short story, The Godfather of Soul, which I created using ElevenLabs. I thought it might be useful for some writers to hear my experience of using it. (NB In this post, I’ve not discussed the wider moral/ethical concerns some people have of using artificial intelligence rather than human readers – I might do that in another post, later).
I love ElevenLabs! It is the best, most realistic AI voice I’ve heard. Still not perfect (see below), but pretty bloody good. And from a cost point of view, far more cost-effective than hiring a good voice actor. Yes, it would take a fair bit of work to use for a full-length novel, but I would consider it.
If you’re wondering how good it is, one person I sent it to said she didn’t realise it was an AI voice at first. And when I pasted in the initial chapter of Caroline Tangent, which contains several French words, it pronounced them perfectly. To cut to the chase, click on the ‘Listen to Sample’ button to hear for yourself.
How it Works
Create an account, select a payment plan, select a voice (more on that below), paste in your text and click generate. It’s that simple. When the audio file has been generated, you download it as an MP3. (All the text you generate as audio is also saved in your account, so you can also download later).
The lowest payment plan is only $5 a month, so you can experiment with that before committing to a higher plan. The benefits of the different plans are primarily the same (with a few extra features in the higher plans) – the key difference being the amount of text you can generate into an audio file each month. The more you pay, the more you can generate.
You can also listen to all the core voices (see below) before you sign-up to anything.
The core set are currently all American accents, male and female, and have an especially good cadence even with their default settings. But in addition, you can change the voice in different ways to make it more/less expressive and the level of clarity. However, you can also Add Voices, which have two sub-options:
i) Voice Design – which has ElevenLabs’ own British, African, Australian and Indian voices – male/female, and young/middle-aged/old.
ii) Voice Cloning – where you can upload a one-minute (or more) sample of your/someone elses’s voice and ElevenLabs creates a voice from that. (Obviously you need permission etc etc if you’re using someone else’s). I tried that and it is pretty impressive. Keeps the actor’s voice very well and with good, expressive vocals.
In the end, I used ElevenLabs’ own British voice and played with a lot of the settings in order to get one I liked and thought appropriate for my book. That did take some time, but once you have it, you can save it and use it whenever you want. It was especially harder to make it read slowly, compared to the core American voices provided.
You can actually add multiple voices, which could be useful if you want to have different narrators, genres, styles etc.
The Main Challenges (For Me)
The core issue for creating a full-length novel is that you can only generate 5,000 characters ‘at a time’ – i.e. in one MP3 file. Including spaces. My short story had about 30,000 characters so I only needed to generate six MP3 files, but for my 100,000 word novel, that has over 500,000 characters. That would mean I had to generate over one hundred separate MP3s. I could do it, but it would take time and you’d need to be careful not to miss any text, copy-and-paste in the wrong order, and ensure you numbered your output files correctly!
Not only is this time consuming and adds risk, but then you have to combine all the MP3s into one file – or multiple files if you’re creating chapters. There are various software packages and websites you can use to do this, but I found the simplest was to use Windows’ command line. Simply open Windows Command Prompt, change directory to wherever you’ve saved the MP3 files, and use the following format to combine the files: copy /b file1.mp3 + file2.mp3 + file3.mp3 newfile.mp3 (obviously replacing file1 etc with the names of your files). Keep the naming convention short to make it simple.
The other current issue for me is that the voice still has some issues with cadence and emphasis on some words. You can’t currently add pauses, special characters to add emphasis and so on. I reckon 95% of the time, that doesn’t matter, but every now and then it does jar a little.