Sunday, November 9, 2025

Intro

Ever wanted to use speech to text in a canvas app but you’re stuck with the default component: 

Follow this guide and discover how to make it happen using Power Automate. Multiple ways are possible, I prefer this one to keep control of your data. 

Canvas App

ChatGPT said:

In Power Apps, when you record audio with a microphone control, you can’t use the raw file directly for uploads or API calls. First, the recording is stored in a variable. Then it’s converted to a JSON string with Base64 encoding using JSON(..., JSONFormat.IncludeBinaryData). However, this includes a metadata prefix (data:audio/wav;base64,) that most external systems don’t need. So, the final step removes that prefix, leaving you with a clean Base64 string ready for upload or transmission. This method is ideal for sending audio to APIs, Azure Blob Storage, or SharePoint.


On the OnStop property of the Microphone component we want the following code:


// Store the audio recording from the microphone control into a temporary variable
Set(varTempRecording, mic_Recorder.Audio);

// Convert the audio recording to a JSON string, including binary data
Set(varTempJSON, JSON(mic_Recorder.Audio, JSONFormat.IncludeBinaryData));

// Remove the first 24 characters from the JSON string (The "data:audio/wav;base64," prefix) to isolate the pure Base64 audio data
Set(varStringBase64, Mid(varTempJSON, 25, Len(varTempJSON) - 25));

Power Automate flow

Step-by-Step Overview of the Voice-to-Text Flow

  1. Trigger from Power Apps
    The flow is triggered by Power Apps, which sends a Base64-encoded audio file as input. The variable varStringBase64 will be used as input.

  2. Store Audio in SharePoint
    The audio file is first saved in a SharePoint document library with a dynamic .mp3 filename.
    File Content is the variable varStringBase64 to base64: @{base64ToBinary(triggerBody()[‘text’])}

  3. Get File Content from SharePoint
    The binary content of the saved audio file is retrieved so it can be uploaded elsewhere.

  4. Upload Audio to Azure Blob Storage
    The binary file is uploaded to a designated folder in Azure Blob Storage. This is required because Azure Speech Services uses blob URLs to access audio.

  5. Generate SAS URL
    A secure SAS URL is created for the uploaded audio file. This allows Azure Cognitive Services to access the file without exposing full blob storage access.

  6. Create Transcription Request (Azure)
    The locale: e.g., nl-NL
    ContentUrls Item: Web (SAS) Url from the previous step 
    destinationContainerUrl: blob SAS URL from the container (create one in Azure. Container > Settings > Shared access tokens)

  7. Compose Path
    A path is created using the returned transcription ID. This will be used to fetch the transcription result once it’s ready.
    /batchtranscription/@{last(split(body(‘Create_transcription_(V3.1)’)?[‘self’], ‘/’))
    }/contenturl_0.json

  8. Delay
    A short wait (e.g., 3 seconds) is added to allow Azure to finish the transcription process. (Preferably a more secure way is implemented to be sure the file is ready.)

  9. Retrieve Transcription Blob
    The flow fetches the result file from Azure Blob Storage using the composed path.

  10. Decode Base64 Output
    The result content is still Base64-encoded, so it’s converted into a readable string.
    @{base64ToString(outputs(‘Get_blob_content_(V2)’)?[‘body’]?[‘$content’]) }

  11. Parse JSON
    The string is parsed into structured JSON, allowing access to specific fields like recognizedPhrases and lexical.

  12. Return Transcribed Text to Power Apps
    The final transcribed text is sent back to Power Apps as a response, ready to be displayed, stored, or used in other logic.
    @first(body(‘Parse_JSON’)?[‘combinedRecognizedPhrases’])?[‘lexical’]


Demo

Once clicked, the app begins recording audio. The recording is stopped when the speaker is finished and pressed the stop button. Behind the scenes, the recorded audio is sent to the Power Automate flow that uploads it to SharePoint and Azure Blob Storage, triggers Azure Speech-to-Text, and retrieves the transcription. After processing, the transcribed text is returned and displayed in the app, all starting from a single tap on the microphone button. 

Leave a Comment

* By using this form you agree with the storage and handling of your data by this website.