Intro
Ever wanted to use speech to text in a canvas app but you’re stuck with the default component: ![]()
Follow this guide and discover how to make it happen using Power Automate. Multiple ways are possible, I prefer this one to keep control of your data.
Canvas App
ChatGPT said:
In Power Apps, when you record audio with a microphone control, you can’t use the raw file directly for uploads or API calls. First, the recording is stored in a variable. Then it’s converted to a JSON string with Base64 encoding using JSON(..., JSONFormat.IncludeBinaryData). However, this includes a metadata prefix (data:audio/wav;base64,) that most external systems don’t need. So, the final step removes that prefix, leaving you with a clean Base64 string ready for upload or transmission. This method is ideal for sending audio to APIs, Azure Blob Storage, or SharePoint.
On the OnStop property of the Microphone component we want the following code:
// Store the audio recording from the microphone control into a temporary variable
Set(varTempRecording, mic_Recorder.Audio);
// Convert the audio recording to a JSON string, including binary data
Set(varTempJSON, JSON(mic_Recorder.Audio, JSONFormat.IncludeBinaryData));
// Remove the first 24 characters from the JSON string (The "data:audio/wav;base64," prefix) to isolate the pure Base64 audio data
Set(varStringBase64, Mid(varTempJSON, 25, Len(varTempJSON) - 25));
Power Automate flow
Step-by-Step Overview of the Voice-to-Text Flow
Trigger from Power Apps
The flow is triggered by Power Apps, which sends a Base64-encoded audio file as input. The variable varStringBase64 will be used as input.Store Audio in SharePoint
The audio file is first saved in a SharePoint document library with a dynamic.mp3filename.
File Content is the variable varStringBase64 to base64: @{base64ToBinary(triggerBody()[‘text’])}Get File Content from SharePoint
The binary content of the saved audio file is retrieved so it can be uploaded elsewhere.Upload Audio to Azure Blob Storage
The binary file is uploaded to a designated folder in Azure Blob Storage. This is required because Azure Speech Services uses blob URLs to access audio.Generate SAS URL
A secure SAS URL is created for the uploaded audio file. This allows Azure Cognitive Services to access the file without exposing full blob storage access.Create Transcription Request (Azure)
The locale: e.g.,nl-NL
ContentUrls Item: Web (SAS) Url from the previous step
destinationContainerUrl: blob SAS URL from the container (create one in Azure. Container > Settings > Shared access tokens)Compose Path
A path is created using the returned transcription ID. This will be used to fetch the transcription result once it’s ready.
/batchtranscription/@{last(split(body(‘Create_transcription_(V3.1)’)?[‘self’], ‘/’))
}/contenturl_0.jsonDelay
A short wait (e.g., 3 seconds) is added to allow Azure to finish the transcription process. (Preferably a more secure way is implemented to be sure the file is ready.)Retrieve Transcription Blob
The flow fetches the result file from Azure Blob Storage using the composed path.Decode Base64 Output
The result content is still Base64-encoded, so it’s converted into a readable string.
@{base64ToString(outputs(‘Get_blob_content_(V2)’)?[‘body’]?[‘$content’]) }Parse JSON
The string is parsed into structured JSON, allowing access to specific fields likerecognizedPhrasesandlexical.Return Transcribed Text to Power Apps
The final transcribed text is sent back to Power Apps as a response, ready to be displayed, stored, or used in other logic.
@first(body(‘Parse_JSON’)?[‘combinedRecognizedPhrases’])?[‘lexical’]
Demo
Once clicked, the app begins recording audio. The recording is stopped when the speaker is finished and pressed the stop button. Behind the scenes, the recorded audio is sent to the Power Automate flow that uploads it to SharePoint and Azure Blob Storage, triggers Azure Speech-to-Text, and retrieves the transcription. After processing, the transcribed text is returned and displayed in the app, all starting from a single tap on the microphone button.