- Article
- 8 minutes to read
This article covers the basics of using the very powerful Android.Speech namespace. Since its inception, Android has been able to recognize speech and output it as text. It's a relatively simple process. With text-to-speech, however, the process is more complex because not only the speech engine has to be considered, but also the available and installed languages of the text-to-speech system (TTS).
language overview
Having a system that “understands” human speech and pronounces what is typed—speech to text and text to speech—is an ever-growing area in mobile development as the demand for natural communication with our devices increases. There are many cases where a function that converts text to speech or vice versa is a very useful tool to integrate into your Android application.
For example, as users limit cell phone use while driving, they want a hands-free way to operate their devices. The plethora of different Android form factors - like Android Wear - and the ever-widening inclusion of those who can use Android devices (like tablets and notepads) has led to a greater focus on great TTS applications.
Google exposes a variety of APIs to the developer in the Android.Speech namespace to cover most cases where a device is made "speech-enabled" (e.g. software designed for the blind). The namespace includes the ability to translate text to speechAndroid.Speech.Tts
, control over the engine used to perform the translation, as well as a number ofRecognizerIntent
s, which can be used to convert speech to text.
While the facilities for understanding speech are there, there are limitations due to the hardware used. It is unlikely that the device will successfully interpret anything spoken to it in every available language.
Requirements
There are no specific requirements for this guide other than that your device has a microphone and speaker.
The core of an Android device that interprets speech is the use of anIntention
with appropriateOnActivityResult
. However, it is important to realize that language is not understood but translated into text. The difference is important.
The difference between understanding and interpreting
A simple definition of understanding is being able to use tone and context to determine the true meaning of what is being said. Interpreting is just taking the words and putting them out in a different form.
Consider the following simple example used in everyday conversations:
Hi how are you?
With no inflection (emphasis on certain words or parts of words), it's a simple question. However, if a slow tempo is applied to the line, the person listening will realize that the questioner is not too happy and may need cheering up or that the questioner is not feeling well. If the emphasis is on "are", the questioner is usually more interested in the answer.
Without some fairly powerful audio processing to take advantage of the inflection and some level of artificial intelligence (AI) to understand the context, the software can't even begin to understand what was being said - the best a basic phone can do , is to convert speech to text.
Furnish
Before using the voice system, you should always check whether the device has a microphone. There would be little point in running your app on a Kindle or Google notepad without a microphone installed.
The following code example shows how to query if a microphone is available and if not to create an alert. If a microphone is not available at that time, you would either stop the activity or disable the ability to record the speech.
string rec = Android.Content.PM.PackageManager.FeatureMicrophone;if (rec != "android.hardware.microphone"){ var alert = new AlertDialog.Builder(recButton.Context); alert.SetTitle("You don't seem to have a microphone to record"); alert.SetPositiveButton("OK", (sender, e) => { return; }); alert.Show();}
Creating the Intent
Intent for the language system uses a specific type of intent calledRecognizerIntent
. This intent controls a large number of parameters, including how long to wait with silence before considering recording complete, any additional languages to be detected and output, and any text to be included in theIntention
The modal dialog of as instruction means. In this snippetVOICE
is aread-only int
used for detection inOnActivityResult
.
var voiceIntent = new Intent(RecognizerIntent.ActionRecognizeSpeech);voiceIntent.PutExtra(RecognizerIntent.ExtraLanguageModel, RecognizerIntent.LanguageModelFreeForm);voiceIntent.PutExtra(RecognizerIntent.ExtraPrompt, Application.Context.GetString(Resource.String.messageSpeakNow));voiceIntent(RecognizerIntent.PutExtra) RecognizerIntent.ExtraSpeechInputCompleteSilenceLengthMillis, 1500);voiceIntent.PutExtra(RecognizerIntent.ExtraSpeechInputPossiblyCompleteSilenceLengthMillis, 1500);voiceIntent.PutExtra(RecognizerIntent.ExtraSpeechInputMinimumLengthMillis, 15000);voiceIntent.PutExtra(RecognizerIntent.ExtraMaxResults, 1);voiceIntent.PutExtra(RecognizerIntent.ExtraLanguage, Java. Util.Locale.Default);StartActivityForResult(voiceIntent, VOICE);
transformation of speech
The text interpreted from the speech is delivered within theIntention
, which is returned when the activity is complete and is accessed viaGetStringArrayListExtra(RecognizerIntent.ExtraResults)
. This returns aIList<String>
, whose index can be used and displayed, depending on the number of languages requested (and specified) in the caller intentRecognizerIntent.ExtraMaxResults
). However, as with any list, it's worth checking if you want data to be displayed.
When listening for the return value of aStartActivityForResult
, DieOnActivityResult
Method must be specified.
In the example belowtext box
is atext box
used to output the dictation. It could also be used to pass the text to some sort of interpreter and from there the application can compare the text and branch to another part of the application.
protected override void OnActivityResult(int requestCode, result resultVal, intent data) { if (requestCode == VOICE) { if (resultVal == Result.Ok) { var matches = data.GetStringArrayListExtra(RecognizerIntent.ExtraResults); if (matches.Count != 0) { string textInput = textBox.Text + matches[0]; textBox.Text = textInput; switch (matches[0].Substring(0, 5).ToLower()) { case "north": MovePlayer(0); break; case "south": MovePlayer(1); break; } } else { textBox.Text = "No language was detected"; } } base.OnActivityResult(requestCode, resultVal, data); }}
text to speech
Text-to-speech is not quite the opposite of speech-to-text and relies on two key components; a text-to-speech engine installed on the device and an installed language.
Android devices mostly come with the Google TTS service installed by default and at least one language. This is set when the device is first set up and is based on where the device is located at the time (e.g. a phone set up in Germany will install the German language while one in America will use American English).
Step 1 - Instantiate TextToSpeech
text to speech
can take up to 3 parameters, the first two are required, the third is optional (application context
,IOnInitListener
,Motor
). The listener is used to bind to the service and test for errors, with the engine being any number of available Android text-to-speech engines. The device has at least one Google-owned engine.
Step 2 - Finding the available languages
TheJava.Util.Local
class contains a useful method calledGetAvailableLocales()
. This list of languages supported by the language engine can then be compared to the installed languages.
Creating the list of "understood" languages is a trivial matter. There will always be a default language (the language that the user set when they first set up their device), so in this example, thelist<string>
has "Default" as the first parameter, the rest of the list is filled depending on the result of thetextToSpeech.IsLanguageAvailable(locale)
.
var langAvailable = new List<string>{ "Default" };var localesAvailable = Java.Util.Locale.GetAvailableLocales().ToList();foreach (var locale in localesAvailable){ var res = textToSpeech.IsLanguageAvailable(locale); switch (res) { case LanguageAvailableResult.Available: langAvailable.Add(locale.DisplayLanguage); break case LanguageAvailableResult.CountryAvailable: langAvailable.Add(locale.DisplayLanguage); break case LanguageAvailableResult.CountryVarAvailable: langAvailable.Add(locale.DisplayLanguage); break }}langAvailable = langAvailable.OrderBy(t => t).Distinct().ToList();
This code callsTextToSpeech.IsLanguageAvailableto test if the language pack for a specific locale is already present on the device. This method returns aLanguageAvailableResult, which indicates whether the language is available for the passed locale. IfLanguageAvailableResult
indicates that the language isUnsupported
, then no language pack is available for this language (not even for download). IfLanguageAvailableResult
is set toMissing data
, then it is possible to download a new language pack as explained below in step 4.
Step 3 - Adjusting Speed and Pitch
Android allows the user to change the sound of the language by changing thespeaking rate
Andpitch
(the speed rate and tone of speech). This goes from 0 to 1, with "normal" speech being 1 for both.
Step 4 - Testing and Loading New Languages
Downloading a new language is done via aIntention
. The result of this intention causes theOnActivityResultMethod to be called. Unlike the Speech-to-Text example (where theRecognizerIntentAs aSetExtra
parameters forIntention
), testing and loadingIntention
s areaction
-based:
TextToSpeech.Engine.ActionCheckTtsData- Starts an activity from the platform
text to speech
Engine to check the correct installation and availability of language resources on the device.TextToSpeech.Engine.ActionInstallTtsData- Starts an activity that prompts the user to download the required languages.
The following code example demonstrates how to use these actions to test language resources and download a new language:
var checkTTSIntent = new Intent();checkTTSIntent.SetAction(TextToSpeech.Engine.ActionCheckTtsData);StartActivityForResult(checkTTSIntent, NeedLang);//protected override void OnActivityResult(int req, Result res, Intent data){ if (req == NeedLang) { var installTTS = neue Absicht (); installTTS.SetAction(TextToSpeech.Engine.ActionInstallTtsData); StartActivity(installTTS); }}
TextToSpeech.Engine.ActionCheckTtsData
Language resource availability tests.OnActivityResult
will be called when this test is complete. If language resources need to be downloaded,OnActivityResult
fire themTextToSpeech.Engine.ActionInstallTtsData
Action to start an activity that allows the user to download the required languages. Notice thisOnActivityResult
Implementation does not check themResult
code, as this simplified example has already determined that the language pack needs to be downloaded.
TheTextToSpeech.Engine.ActionInstallTtsData
action causes theGoogle TTS voice dataActivity to be presented to the user to choose the language to download:
For example, user can select French and click download icon to download French language data:
This data is installed automatically after the download is complete.
Step 5 - The IOnInitListener
In order for an activity to convert the text into speech, the interface method is usedOnInit
needs to be implemented (this is the second parameter required to instantiate thetext to speech
Class). This initializes the listener and tests the result.
The listener should test bothOperationsResult.Success
AndOperationsResult.Failure
at least. The following example shows exactly that:
void TextToSpeech.IOnInitListener.OnInit(OperationResult status){ // if we get an error, reset to default language by default if (status == OperationResult.Error) textToSpeech.SetLanguage(Java.Util.Locale.Default); // if the listener is ok, set the language if (status == OperationResult.Success) textToSpeech.SetLanguage(lang);}
Summary
In this guide, we've covered the basics of text-to-speech and speech-to-text conversion and possible methods for incorporating it into your own apps. While they don't cover every case, you should now have a basic understanding of how language is interpreted, how to install new languages, and how to increase the inclusivity of your apps.
- Xamarin.Forms dependency service
- Text to speech (example)
- speech to text (example)
- Android.Speech-Namespace
- Android.Speech.Tts-Namespace