- Article
- 8 minutes to read
Download the example
Azure Speech Service is a cloud-based API that provides the following capabilities:
- speech-to-texttranscribes audio files or streams to text.
- text to speechconverts input text into human-like synthesized speech.
- language translationenables real-time multilingual speech-to-text and speech-to-speech translation.
- voice assistantscan create human-like conversational interfaces for applications.
This article explains how to implement speech-to-text in the Xamarin.Forms sample application using the Azure Speech service. The following screenshots show the sample application on iOS and Android:
Create an Azure Speech Service resource
Azure Speech Service is part of Azure Cognitive Services, which provides cloud-based APIs for tasks such as image recognition, speech recognition and translation, and Bing search. For more information, seeWas sind Azure Cognitive Services?.
The sample project requires an Azure Cognitive Services resource to be created in your Azure portal. A Cognitive Services resource can be for a single service, e.g. B. the voice service, or be created as a multi-service resource. The steps to create a Speech Service resource are as follows:
- Sign in to yourAzure-Portal.
- Create a multi-service or single-service resource.
- Get the API key and region information for your resource.
- Update the exampleconstants.csFile.
For step-by-step instructions on how to create a resource, seeCreate a Cognitive Services resource.
note
If you don't have oneAzure-Subscription, a ... createfree accountBefore you start. Once you have an account, a single service resource can be created in the free tier to try out the service.
Configure your app with the voice service
After creating a Cognitive Services resource, theconstants.csFile can be updated with region and API key from your Azure resource:
public static class constants{ public static string CognitiveServicesApiKey = "YOUR_KEY_GOES_HERE"; public static string CognitiveServicesRegion = "westus";}
Install the NuGet Speech Service package
The sample application uses theMicrosoft.CognitiveServices.SpeechNuGet package to connect to Azure Speech Service. Install this NuGet package in the shared project and in each platform project.
Create an IMicrophoneService interface
Each platform requires permission to access the microphone. The example project provides aIMicrophoneService
interface in the shared project and uses the Xamarin.Formsdependency service
to get platform implementations of the interface.
public interface IMicrophoneService{ Task<bool> GetPermissionAsync(); void OnRequestPermissionResult(bool isGranted);}
Create the page layout
The sample project defines a basic page layout in theMainPage.xamlFile. The most important layout elements are aTaste
this starts the transcription process, alabel
to contain the transcribed text, and aActivity Monitor
to indicate when the transcription is running:
<ContentPage ...> <StackLayout> <Frame ...> <ScrollView x:Name="scroll" ...> <Label x:Name="transcribedText" ... /> </ScrollView> </Frame> <ActivityIndicator x:Name="transcribingIndicator" IsRunning="False" /> <Button x:Name="transcribeButton" ... Clicked="TranscribeClicked"/> </StackLayout></ContentPage>
Implement the language service
Themain page .xaml .csCode-behind file contains all logic to send audio and receive transcribed text from Azure Speech Service.
Themain page
The constructor gets an instance ofIMicrophoneService
interface from thedependency service
:
public subclass MainPage : ContentPage{ SpeechRecognizer recognizer; IMicrophoneService micService; bool isTranscribing = false; public MainPage() { InitializeComponent(); micService = DependencyService.Resolve<IMicrophoneService>(); } // ... }
TheTranscribeClicked
method is called when theTranscribeButton
Instance is tapped:
async void TranscribeClicked(object sender, EventArgs e){ bool isMicEnabled = await micService.GetPermissionAsync(); // EARLY OUT: Make sure the microphone is accessible if (!isMicEnabled) { UpdateTranscription("Please allow microphone access!"); to return; } // Initialize speech recognition if (recognizer == null) { var config = SpeechConfig.FromSubscription(Constants.CognitiveServicesApiKey, Constants.CognitiveServicesRegion); Recognizer = new SpeechRecognizer(config); Recognizer.Recognized += (obj, args) => {UpdateTranscription(args.Result.Text); }; } // If already transcribing, stop speech recognition if (isTranscribing) { try { await recognitionr.StopContinuousRecognitionAsync(); } catch(Exception ex) { UpdateTranscription(ex.Message); } isTranscribing = false; } // If not transcribed, start speech recognition else { Device.BeginInvokeOnMainThread(() => { InsertDateTimeRecord(); }); try {await recognitionr.StartContinuousRecognitionAsync(); } catch(Exception ex) { UpdateTranscription(ex.Message); } isTranscribing = true; } UpdateDisplayState();}
TheTranscribeClicked
method does the following:
- Checks if the application has access to the microphone and terminates early if it doesn't.
- Creates an instance of
SpeechRecognizer
class if not already available. - Stops the continuous transcription if it is in progress.
- Inserts a timestamp and starts continuous transcription when not running.
- Notifies the application to update its appearance based on the new application state.
The rest ofmain page
Class methods are helpers for displaying application status:
void UpdateTranscription(string newText){ Device.BeginInvokeOnMainThread(() => { if (!string.IsNullOrWhiteSpace(newText)) { transcribedText.Text += $"{newText}\n"; } });}void InsertDateTimeRecord() { var msg = $"=================\n{DateTime.Now.ToString()}\n============== ==="; UpdateTranscription(msg);}void UpdateDisplayState(){ Device.BeginInvokeOnMainThread(() => { if (isTranscribing) { transcribeButton.Text = "Stop"; transcribeButton.BackgroundColor = Color.Red; transcribingIndicator.IsRunning = true; } else { transcribeButton.Text = "Transkribieren"; transcribeButton.BackgroundColor = Color.Green; transcribingIndicator.IsRunning = false; } });}
TheUpdate Transcription
method writes the providedneuerText
line
for thelabel
element namedtranscribed text
. It forces this update to run on the UI thread, so it can be called from any context without throwing exceptions. TheInsert DateTimeRecord
writes the current date and time to thetranscribed text
Instance to mark the beginning of a new transcription. Finally, thatUpdate DisplayState
method updates theTaste
AndActivity Monitor
Elements reflecting whether or not a transcription is in progress.
Create platform microphone services
The application must have access to the microphone to capture voice data. TheIMicrophoneService
Interface must be implemented and registered withdependency service
on each platform for the application to work.
Android
The sample project defines aIMicrophoneService
Implementation called for AndroidAndroidMicrophoneService
:
[Assembly: Dependency(typeof(AndroidMicrophoneService))]Namespace CognitiveSpeechService.Droid.Services{ public class AndroidMicrophoneService : IMicrophoneService { public const int RecordAudioPermissionCode = 1; private TaskCompletionSource<bool> tcsPermissions; Zeichenfolge [] Berechtigungen = neue Zeichenfolge [] { Manifest.Permission.RecordAudio }; public Task<bool> GetPermissionAsync() { tcsPermissions = new TaskCompletionSource<bool>(); if ((int)Build.VERSION.SdkInt < 23) {tcsPermissions.TrySetResult(true); } sonst { var aktuelleAktivität = MainActivity.Instance; if (ActivityCompat.CheckSelfPermission(currentActivity, Manifest.Permission.RecordAudio) != (int)Permission.Granted) { RequestMicPermissions(); } Else {tcsPermissions.TrySetResult(true); } } Rückgabe tcsPermissions.Task; } public void OnRequestPermissionResult(bool isGranted) { tcsPermissions.TrySetResult(isGranted); } void RequestMicPermissions() { if (ActivityCompat.ShouldShowRequestPermissionRationale(MainActivity.Instance, Manifest.Permission.RecordAudio)) { Snackbar.Make(MainActivity.Instance.FindViewById(Android.Resource.Id.Content), „Für Sprache sind Mikrofonberechtigungen erforderlich Transkription!", Snackbar.LengthIndefinite) .SetAction("Ok", v => { ((Activity)MainActivity.Instance).RequestPermissions(permissions, RecordAudioPermissionCode); }) .Show(); } else { ActivityCompat.RequestPermissions((Activity)MainActivity.Instance, permissions, RecordAudioPermissionCode); } } }}
TheAndroidMicrophoneService
has the following properties:
- The
dependency
Attribute registers the class with thedependency service
. - The
GetPermissionAsync
method checks if permissions are required based on Android SDK version and callsRequestMicPermissions
if permission has not already been granted. - The
RequestMicPermissions
method uses thetakeaway
Class to request permissions from the user if a justification is needed, else it directly requests audio recording permissions. - The
OnRequestPermissionResult
Method is called with abool
Result once the user has responded to the permission request.
Themain activity
Class is adjusted to update theAndroidMicrophoneService
Instance when permission requests are completed:
public class MainActivity: global::Xamarin.Forms.Platform.Android.FormsAppCompatActivity{ IMicrophoneService micService; internal static MainActivity instance { get; private sentence; } protected override void OnCreate(Bundle savedInstanceState) { Instance = this; // ... micService = DependencyService.Resolve<IMicrophoneService>(); } public override void OnRequestPermissionsResult(int requestCode, string[] permissions, [GeneratedEnum] Android.Content.PM.Permission[] grantResults) { // ... switch(requestCode) { case AndroidMicrophoneService.RecordAudioPermissionCode: if (grantResults[0] == Permission.Granted) { micService.OnRequestPermissionResult(true); } Else {micService.OnRequestPermissionResult(false); } break; } }}
Themain activity
Class defines a static reference namedExample
what is required of theAndroidMicrophoneService
Object when requesting permissions. It overwrites theOnRequestPermissionsResult
Method for updating theAndroidMicrophoneService
-Object when the permission request is approved or denied by the user.
Finally, the Android application must contain the permission to record audioAndroidManifest.xmlFile:
<manifest ...> ... <uses-permission android:name="android.permission.RECORD_AUDIO" /></manifest>
iOS
The sample project defines aIMicrophoneService
Implementation called for iOSiOSMicrophoneService
:
[Assembly: Dependency(typeof(iOSMicrophoneService))]namespace CognitiveSpeechService.iOS.Services{ public class iOSMicrophoneService : IMicrophoneService { TaskCompletionSource<bool> tcsPermissions; public Task<bool> GetPermissionAsync() { tcsPermissions = new TaskCompletionSource<bool>(); RequestMicPermission(); tcsPermissions.Task zurückgeben; } public void OnRequestPermissionResult(bool isGranted) { tcsPermissions.TrySetResult(isGranted); } void RequestMicPermission() { var session = AVAudioSession.SharedInstance(); session.RequestRecordPermission((gewährt) => { tcsPermissions.TrySetResult(gewährt); }); } }}
TheiOSMicrophoneService
has the following properties:
- The
dependency
Attribute registers the class with thedependency service
. - The
GetPermissionAsync
MethodenaufrufeRequestMicPermissions
to request permissions from the device user. - The
RequestMicPermissions
Method uses the sharedAVAudioSession
Instance to request recording permissions. - The
OnRequestPermissionResult
method updates theTaskCompletionSource
Example with the providedbool
Wert.
Finally the iOS appInfo.plistmust contain a message telling the user why the app is requesting access to the microphone. Edit the Info.plist file to include the following tags in the<dict>
Element:
<plist> <dict> ... <key>NSMicrophoneUsageDescription</key> <string>Speech transcription requires microphone access</string> </dict></plist>
UWP
The sample project defines aIMicrophoneService
Implementation called for UWPUWPMicrophoneService
:
[Assembly: Dependency(typeof(UWPMicrophoneService))]Namespace CognitiveSpeechService.UWP.Services{ public class UWPMicrophoneService : IMicrophoneService { public async Task<bool> GetPermissionAsync() { bool isMicAvailable = true; try { var mediaCapture = new MediaCapture (); var settings = new MediaCaptureInitializationSettings(); settings.StreamingCaptureMode = StreamingCaptureMode.Audio; warte auf mediaCapture.InitializeAsync (Einstellungen); } catch(Exception ex) { isMicAvailable = false; } if(!isMicAvailable) { await Windows.System.Launcher.LaunchUriAsync(new Uri("ms-settings:privacy-microphone")); } IsMicAvailable zurückgeben; } public void OnRequestPermissionResult(bool isGranted) { // tut absichtlich nichts } }}
TheUWPMicrophoneService
has the following properties:
- The
dependency
Attribute registers the class with thedependency service
. - The
GetPermissionAsync
Method attempts to initialize aMediaCapture
Example. If this fails, a user request to enable the microphone is launched. - The
OnRequestPermissionResult
method exists to satisfy the interface, but is not required for the UWP implementation.
Finally the UWPPackage.appxmanifestmust specify that the application uses the microphone. Double-click the Package.appxmanifest file and select themicrophoneoption on thecapabilitiesTab in Visual Studio 2019:
Test the application
Run the app and click theTranscribeButton. The app should request access to the microphone and start the transcription process. TheActivity Monitor
is animated, showing that transcription is active. As you speak, the app streams audio to the Azure Speech Services resource, which responds with transcribed text. The transcribed text appears in thelabel
item as received.
note
Android emulators fail to load and initialize the Speech Service libraries. Testing on a physical device is recommended for the Android platform.
- Azure Speech Service sample
- Azure Speech service overview
- Create a Cognitive Services resource
- Quick Start: Detect speech from a microphone