Denis Potschien July 11th, 2014

HTML5 Clinic: Make Your Browser Talk via The Speech Synthesis API

Navigation devices do it since forever. Now browsers are learning to do it, too. I'm talking about speech synthesis. HTML5's new Speech Synthesis API allows your website to speak or more precise, to read out loud what is written there using a human voice. There are different voices for different languages, even several voices per language. Now all it takes is pressing a button and the browser will read your website contents to any visitor who demands it. A nice service for anyone and inevitable even for the visually impaired. speechsynthesis-w550

Lots of Voices and Languages

The number of available voices and languages depends on the browser used. Chrome supports nine languages, among those British and American English, German, French, Spanish and Italian. British English offers am male and a female voice, all other languages only offer female voices. It is possible to use a French voice to read a German text. The text will then be read in German with a French accent. Funny, yet not very sensible..

Simple Playback

To have a text being read aloud, we first need to create a new „utterance“. Using one of the properties „voice“ or „lang“ we tell the browser which language to read in. Afterwards we start the playback.
var worte = new SpeechSynthesisUtterance("Hello, nice to see you.");
worte.lang = "en-UK";
window.speechSynthesis.speak(worte);
The example uses British English. As there are different languages for English, we need to define the language via „lang“ for British English as „en-UK“ and for American English as „en-US“. Per method „speak()“ our example text will be read out with a British English-speaking voice.
var worte = new SpeechSynthesisUtterance("Hello, nice to see you.");
var stimmen = window.speechSynthesis.getVoices();
worte.voice = stimmen[6];
window.speechSynthesis.speak(worte);
In our second example we do not choose a language but rather a voice directly. To do that we call all available voices via „getVoices()“. Then, using the property „voice“, we add one voice to the utterance. Voice number 6 would represent the German voice. The end result of our two examples is identical.

Request Available Voices

As soon as we use „getVoices()“ to access the available voices of the browser, we need to work with event listeners as the browser will not load the voices together with the document. That means, it is not possible to call „getVoices()“ directly on page load. Instead use the event listener „DOMContentLoaded“, which will load a function that will check the availability of the Speech Synthesis API via an „if“ query. It will also check whether the API is supported by the calling browser at all. Afterwards start play back using the „click“ event.
window.addEventListener("DOMContentLoaded", function() {
  if (window.speechSynthesis != undefined) {
    document.getElementById("playback").addEventListener("click", function() {
      var stimmen = window.speechSynthesis.getVoices();
      for(var i = 0; i < stimmen.length; i++ ) {
        console.log("Voice " + i.toString() + " " + stimmen[i].name);
      }
    }, false)
  }
}, false)
This example launches a function that will write all voices with their internal numbers and names into the console as soon as the element with the ID „playback“ is clicked.

Controlling Voice Pitch and Speed

Besides the set of dedicated voices for dedicated languages, you'll find one generic voice (number 10) which is designed to function for all languages. As you can easily imagine, this voice needs a little configuration to work well with your desired language. Good to know, that you can adjust the frequency of the voice as well as the speaking rate.
var worte = new SpeechSynthesisUtterance("Let us talk faster and in a higher voice.");
var stimmen = window.speechSynthesis.getVoices();
worte.voice = stimmen[10];
worte.pitch = 4;
worte.rate = 10;
window.speechSynthesis.speak(worte);
Using „pitch“ we control the voice frequency. Allowed are values from 0 to 2, where 1 defines the default frequency of a normal voice. Below 1 the voice will get darker, whereas values over 1 heighten the voice. Use „rate“ to control the speaking rate. Values are allowed to range from 0.1 to 10, where 1 represents the normal speaking rate. Values below 1 lead to slower rates, values above 1 to faster ones. The downside of the generic voice is that it sounds much more synthetic than its dedicated counterparts. These deliver a much more natural sound impression.

Controlling Speech in General

Using the method „speak()“ we start the playback, while „pause()“ - well - pauses it. Using „resume()“ we - well - resume from where we paused.
document.getElementById("pause").addEventListener("click", function() {
  window.speechSynthesis.pause();
}, false);
Use „volume“ to control the volume of the playback. Values need to range from 0 to 1. Several event listeners allow you to bind functions to different playback states, e.g. to the beginning or the end of it.
worte.addEventListener("start", function() {
  document.title = "Now listen …";
}, false);

worte.addEventListener("end", function() {
  document.title = "… that was it.";
}, false);
If you start more than one playback simultaneously or one while another is only running, the individual playbacks will run in a queue one after the other. The property „pending“ tells you whther there are instances of „speechSynthesis“ queued. The value will either be „true“ or „false“. Using „speaking“ you can check whether a playback is running.

Browser Support

You guessed it already. Browser support is not wide spread. Only Chrome from version 25 up, Safari from 6.1 and Mobile Safari from iOS7 support the new API. (dpe)

Denis Potschien

Denis works as a freelance web designer since 2005.

3 comments

  1. Wow, great article! This has been needed for quite some time. Besides the obvious benefits for the visually impaired this has some far reaching consequences for game design and other areas of web development. I am looking forward to trying some of these out.

  2. Excellent piece, thank you very much.

    A question: is there a way to specify the pronounciation of a word? For example how to pronounce “Venezia” in English or “Potomac” in German?

  3. Could you please inform me ,, why x-webkit is not properly support in html5. Further can you give the complete example of any other alternative is exist.. … as i wanted to accept the voice from the user in html5

Leave a Reply

Your email address will not be published. Required fields are marked *