Speech Recognition with the Speech Service API - Xamarin (2023)

  • Article
  • 8 minutes to read

Download the example

Azure Speech Service is a cloud-based API that provides the following capabilities:

  • speech-to-texttranscribes audio files or streams to text.
  • text to speechconverts input text into human-like synthesized speech.
  • language translationenables real-time multilingual speech-to-text and speech-to-speech translation.
  • voice assistantscan create human-like conversational interfaces for applications.

This article explains how to implement speech-to-text in the Xamarin.Forms sample application using the Azure Speech service. The following screenshots show the sample application on iOS and Android:

Create an Azure Speech Service resource

Azure Speech Service is part of Azure Cognitive Services, which provides cloud-based APIs for tasks such as image recognition, speech recognition and translation, and Bing search. For more information, seeWas sind Azure Cognitive Services?.

The sample project requires an Azure Cognitive Services resource to be created in your Azure portal. A Cognitive Services resource can be for a single service, e.g. B. the voice service, or be created as a multi-service resource. The steps to create a Speech Service resource are as follows:

(Video) Text-to-Speech (Xamarin.Essentials API of the Week)

  1. Sign in to yourAzure-Portal.
  2. Create a multi-service or single-service resource.
  3. Get the API key and region information for your resource.
  4. Update the exampleconstants.csFile.

For step-by-step instructions on how to create a resource, seeCreate a Cognitive Services resource.

note

If you don't have oneAzure-Subscription, a ... createfree accountBefore you start. Once you have an account, a single service resource can be created in the free tier to try out the service.

Configure your app with the voice service

After creating a Cognitive Services resource, theconstants.csFile can be updated with region and API key from your Azure resource:

public static class constants{ public static string CognitiveServicesApiKey = "YOUR_KEY_GOES_HERE"; public static string CognitiveServicesRegion = "westus";}

Install the NuGet Speech Service package

The sample application uses theMicrosoft.CognitiveServices.SpeechNuGet package to connect to Azure Speech Service. Install this NuGet package in the shared project and in each platform project.

Create an IMicrophoneService interface

Each platform requires permission to access the microphone. The example project provides aIMicrophoneServiceinterface in the shared project and uses the Xamarin.Formsdependency serviceto get platform implementations of the interface.

public interface IMicrophoneService{ Task<bool> GetPermissionAsync(); void OnRequestPermissionResult(bool isGranted);}

Create the page layout

The sample project defines a basic page layout in theMainPage.xamlFile. The most important layout elements are aTastethis starts the transcription process, alabelto contain the transcribed text, and aActivity Monitorto indicate when the transcription is running:

(Video) Text To Speech with Xamarin Forms

<ContentPage ...> <StackLayout> <Frame ...> <ScrollView x:Name="scroll" ...> <Label x:Name="transcribedText" ... /> </ScrollView> </Frame> <ActivityIndicator x:Name="transcribingIndicator" IsRunning="False" /> <Button x:Name="transcribeButton" ... Clicked="TranscribeClicked"/> </StackLayout></ContentPage>

Implement the language service

Themain page .xaml .csCode-behind file contains all logic to send audio and receive transcribed text from Azure Speech Service.

Themain pageThe constructor gets an instance ofIMicrophoneServiceinterface from thedependency service:

public subclass MainPage : ContentPage{ SpeechRecognizer recognizer; IMicrophoneService micService; bool isTranscribing = false; public MainPage() { InitializeComponent(); micService = DependencyService.Resolve<IMicrophoneService>(); } // ... }

TheTranscribeClickedmethod is called when theTranscribeButtonInstance is tapped:

async void TranscribeClicked(object sender, EventArgs e){ bool isMicEnabled = await micService.GetPermissionAsync(); // EARLY OUT: Make sure the microphone is accessible if (!isMicEnabled) { UpdateTranscription("Please allow microphone access!"); to return; } // Initialize speech recognition if (recognizer == null) { var config = SpeechConfig.FromSubscription(Constants.CognitiveServicesApiKey, Constants.CognitiveServicesRegion); Recognizer = new SpeechRecognizer(config); Recognizer.Recognized += (obj, args) => {UpdateTranscription(args.Result.Text); }; } // If already transcribing, stop speech recognition if (isTranscribing) { try { await recognitionr.StopContinuousRecognitionAsync(); } catch(Exception ex) { UpdateTranscription(ex.Message); } isTranscribing = false; } // If not transcribed, start speech recognition else { Device.BeginInvokeOnMainThread(() => { InsertDateTimeRecord(); }); try {await recognitionr.StartContinuousRecognitionAsync(); } catch(Exception ex) { UpdateTranscription(ex.Message); } isTranscribing = true; } UpdateDisplayState();}

TheTranscribeClickedmethod does the following:

  1. Checks if the application has access to the microphone and terminates early if it doesn't.
  2. Creates an instance ofSpeechRecognizerclass if not already available.
  3. Stops the continuous transcription if it is in progress.
  4. Inserts a timestamp and starts continuous transcription when not running.
  5. Notifies the application to update its appearance based on the new application state.

The rest ofmain pageClass methods are helpers for displaying application status:

void UpdateTranscription(string newText){ Device.BeginInvokeOnMainThread(() => { if (!string.IsNullOrWhiteSpace(newText)) { transcribedText.Text += $"{newText}\n"; } });}void InsertDateTimeRecord() { var msg = $"=================\n{DateTime.Now.ToString()}\n============== ==="; UpdateTranscription(msg);}void UpdateDisplayState(){ Device.BeginInvokeOnMainThread(() => { if (isTranscribing) { transcribeButton.Text = "Stop"; transcribeButton.BackgroundColor = Color.Red; transcribingIndicator.IsRunning = true; } else { transcribeButton.Text = "Transkribieren"; transcribeButton.BackgroundColor = Color.Green; transcribingIndicator.IsRunning = false; } });}

TheUpdate Transcriptionmethod writes the providedneuerText linefor thelabelelement namedtranscribed text. It forces this update to run on the UI thread, so it can be called from any context without throwing exceptions. TheInsert DateTimeRecordwrites the current date and time to thetranscribed textInstance to mark the beginning of a new transcription. Finally, thatUpdate DisplayStatemethod updates theTasteAndActivity MonitorElements reflecting whether or not a transcription is in progress.

Create platform microphone services

The application must have access to the microphone to capture voice data. TheIMicrophoneServiceInterface must be implemented and registered withdependency serviceon each platform for the application to work.

Android

The sample project defines aIMicrophoneServiceImplementation called for AndroidAndroidMicrophoneService:

(Video) Xamarin Forms Text To Speech (Android & iOS Text To Speech Application) App

[Assembly: Dependency(typeof(AndroidMicrophoneService))]Namespace CognitiveSpeechService.Droid.Services{ public class AndroidMicrophoneService : IMicrophoneService { public const int RecordAudioPermissionCode = 1; private TaskCompletionSource<bool> tcsPermissions; Zeichenfolge [] Berechtigungen = neue Zeichenfolge [] { Manifest.Permission.RecordAudio }; public Task<bool> GetPermissionAsync() { tcsPermissions = new TaskCompletionSource<bool>(); if ((int)Build.VERSION.SdkInt < 23) {tcsPermissions.TrySetResult(true); } sonst { var aktuelleAktivität = MainActivity.Instance; if (ActivityCompat.CheckSelfPermission(currentActivity, Manifest.Permission.RecordAudio) != (int)Permission.Granted) { RequestMicPermissions(); } Else {tcsPermissions.TrySetResult(true); } } Rückgabe tcsPermissions.Task; } public void OnRequestPermissionResult(bool isGranted) { tcsPermissions.TrySetResult(isGranted); } void RequestMicPermissions() { if (ActivityCompat.ShouldShowRequestPermissionRationale(MainActivity.Instance, Manifest.Permission.RecordAudio)) { Snackbar.Make(MainActivity.Instance.FindViewById(Android.Resource.Id.Content), „Für Sprache sind Mikrofonberechtigungen erforderlich Transkription!", Snackbar.LengthIndefinite) .SetAction("Ok", v => { ((Activity)MainActivity.Instance).RequestPermissions(permissions, RecordAudioPermissionCode); }) .Show(); } else { ActivityCompat.RequestPermissions((Activity)MainActivity.Instance, permissions, RecordAudioPermissionCode); } } }}

TheAndroidMicrophoneServicehas the following properties:

  1. ThedependencyAttribute registers the class with thedependency service.
  2. TheGetPermissionAsyncmethod checks if permissions are required based on Android SDK version and callsRequestMicPermissionsif permission has not already been granted.
  3. TheRequestMicPermissionsmethod uses thetakeawayClass to request permissions from the user if a justification is needed, else it directly requests audio recording permissions.
  4. TheOnRequestPermissionResultMethod is called with aboolResult once the user has responded to the permission request.

Themain activityClass is adjusted to update theAndroidMicrophoneServiceInstance when permission requests are completed:

public class MainActivity: global::Xamarin.Forms.Platform.Android.FormsAppCompatActivity{ IMicrophoneService micService; internal static MainActivity instance { get; private sentence; } protected override void OnCreate(Bundle savedInstanceState) { Instance = this; // ... micService = DependencyService.Resolve<IMicrophoneService>(); } public override void OnRequestPermissionsResult(int requestCode, string[] permissions, [GeneratedEnum] Android.Content.PM.Permission[] grantResults) { // ... switch(requestCode) { case AndroidMicrophoneService.RecordAudioPermissionCode: if (grantResults[0] == Permission.Granted) { micService.OnRequestPermissionResult(true); } Else {micService.OnRequestPermissionResult(false); } break; } }}

Themain activityClass defines a static reference namedExamplewhat is required of theAndroidMicrophoneServiceObject when requesting permissions. It overwrites theOnRequestPermissionsResultMethod for updating theAndroidMicrophoneService-Object when the permission request is approved or denied by the user.

Finally, the Android application must contain the permission to record audioAndroidManifest.xmlFile:

<manifest ...> ... <uses-permission android:name="android.permission.RECORD_AUDIO" /></manifest>

iOS

The sample project defines aIMicrophoneServiceImplementation called for iOSiOSMicrophoneService:

[Assembly: Dependency(typeof(iOSMicrophoneService))]namespace CognitiveSpeechService.iOS.Services{ public class iOSMicrophoneService : IMicrophoneService { TaskCompletionSource<bool> tcsPermissions; public Task<bool> GetPermissionAsync() { tcsPermissions = new TaskCompletionSource<bool>(); RequestMicPermission(); tcsPermissions.Task zurückgeben; } public void OnRequestPermissionResult(bool isGranted) { tcsPermissions.TrySetResult(isGranted); } void RequestMicPermission() { var session = AVAudioSession.SharedInstance(); session.RequestRecordPermission((gewährt) => { tcsPermissions.TrySetResult(gewährt); }); } }}

TheiOSMicrophoneServicehas the following properties:

  1. ThedependencyAttribute registers the class with thedependency service.
  2. TheGetPermissionAsyncMethodenaufrufeRequestMicPermissionsto request permissions from the device user.
  3. TheRequestMicPermissionsMethod uses the sharedAVAudioSessionInstance to request recording permissions.
  4. TheOnRequestPermissionResultmethod updates theTaskCompletionSourceExample with the providedboolWert.

Finally the iOS appInfo.plistmust contain a message telling the user why the app is requesting access to the microphone. Edit the Info.plist file to include the following tags in the<dict>Element:

<plist> <dict> ... <key>NSMicrophoneUsageDescription</key> <string>Speech transcription requires microphone access</string> </dict></plist>

UWP

The sample project defines aIMicrophoneServiceImplementation called for UWPUWPMicrophoneService:

(Video) Speech recognition application using C#

[Assembly: Dependency(typeof(UWPMicrophoneService))]Namespace CognitiveSpeechService.UWP.Services{ public class UWPMicrophoneService : IMicrophoneService { public async Task<bool> GetPermissionAsync() { bool isMicAvailable = true; try { var mediaCapture = new MediaCapture (); var settings = new MediaCaptureInitializationSettings(); settings.StreamingCaptureMode = StreamingCaptureMode.Audio; warte auf mediaCapture.InitializeAsync (Einstellungen); } catch(Exception ex) { isMicAvailable = false; } if(!isMicAvailable) { await Windows.System.Launcher.LaunchUriAsync(new Uri("ms-settings:privacy-microphone")); } IsMicAvailable zurückgeben; } public void OnRequestPermissionResult(bool isGranted) { // tut absichtlich nichts } }}

TheUWPMicrophoneServicehas the following properties:

  1. ThedependencyAttribute registers the class with thedependency service.
  2. TheGetPermissionAsyncMethod attempts to initialize aMediaCaptureExample. If this fails, a user request to enable the microphone is launched.
  3. TheOnRequestPermissionResultmethod exists to satisfy the interface, but is not required for the UWP implementation.

Finally the UWPPackage.appxmanifestmust specify that the application uses the microphone. Double-click the Package.appxmanifest file and select themicrophoneoption on thecapabilitiesTab in Visual Studio 2019:

Test the application

Run the app and click theTranscribeButton. The app should request access to the microphone and start the transcription process. TheActivity Monitoris animated, showing that transcription is active. As you speak, the app streams audio to the Azure Speech Services resource, which responds with transcribed text. The transcribed text appears in thelabelitem as received.

note

Android emulators fail to load and initialize the Speech Service libraries. Testing on a physical device is recommended for the Android platform.

  • Azure Speech Service sample
  • Azure Speech service overview
  • Create a Cognitive Services resource
  • Quick Start: Detect speech from a microphone

Videos

1. Text To Speech in Xamarin.Forms and .NET MAUI with Essentials
(Gerald Versluis)
2. How to use Azure Cognitive Services .NET Speech SDK to recognize speech from a microphone
(Microsoft Azure)
3. How to use Google Cloud Speech API in C# Winforms application
(Steve Cox)
4. Implement Speech-To-Text on Android with .NET MAUI
(Gerald Versluis)
5. Speech To Text | Google Speech Recognition API #android #java
(Code Vedanam)
6. Microsoft Cognitive Services - David Giard - Xamarin University Guest Lecture
(Xamarin University)
Top Articles
Latest Posts
Article information

Author: Mrs. Angelic Larkin

Last Updated: 04/13/2023

Views: 6058

Rating: 4.7 / 5 (67 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Mrs. Angelic Larkin

Birthday: 1992-06-28

Address: Apt. 413 8275 Mueller Overpass, South Magnolia, IA 99527-6023

Phone: +6824704719725

Job: District Real-Estate Facilitator

Hobby: Letterboxing, Vacation, Poi, Homebrewing, Mountain biking, Slacklining, Cabaret

Introduction: My name is Mrs. Angelic Larkin, I am a cute, charming, funny, determined, inexpensive, joyous, cheerful person who loves writing and wants to share my knowledge and understanding with you.