Audio File Transcription, for Super-Efficient Recording

Manoj Kumar
2 min readJul 13, 2021

--

Introduction

Converting audio into text has a wide range of applications: generating video subtitles, taking meeting minutes, and writing interview transcripts. HUAWEI ML Kit’s service makes doing so easier than ever before, converting audio files into meticulously accurate text, with correct punctuation as well!

Actual Effects

Build and run an app with audio file transcription integrated. Then, select a local audio file and convert it into text.

Development Preparations

For details about configuring the Huawei Maven repository and integrating the audio file transcription SDK, please refer to the Development Guide of ML Kit on HUAWEI Developers.

Declaring Permissions in the AndroidManifest.xml File

Open the AndroidManifest.xml in the main folder. Add the network connection, network status access, and storage read permissions before <application.

Please note that these permissions need to be dynamically applied for. Otherwise, Permission Denied will be reported.

Development Procedure

Creating and Initializing an Audio File Transcription Engine

Use MLRemoteAftSetting to configure the engine. The service currently supports Mandarin Chinese and English, that is, the options of mLanguage are zh and en.

enablePunctuation indicates whether to automatically punctuate the converted text, with a default value of false.

If this parameter is set to true, the converted text is automatically punctuated; false otherwise.

enableWordTimeOffset indicates whether to generate the text transcription result of each audio segment with the corresponding offset. The default value is false. You need to set this parameter only when the audio duration is less than 1 minute.

If this parameter is set to true, the offset information is returned along with the text transcription result. This applies to the transcription of short audio files with a duration of 1 minute or shorter.

If this parameter is set to false, only the text transcription result of the audio file will be returned.

enableSentenceTimeOffset indicates whether to output the offset of each sentence in the audio file. The default value is false.

If this parameter is set to true, the offset information is returned along with the text transcription result.

If this parameter is set to false, only the text transcription result of the audio file will be returned.

Creating a Listener Callback to Process the Transcription Result

private MLRemoteAftListener mAsrListener = new MLRemoteAftListener()

After the listener is initialized, call startTask in AftListener to start the transcription.

Processing the Transcription Result in Polling Mode

After the transcription is completed, call getLongAftResult to obtain the transcription result. Process the obtained result every 10 seconds.

References:

To learn more, please visit:

>> HUAWEI Developers official website

>> Development Guide

>> Reddit to join developer discussions

>> GitHub or Gitee to download the demo and sample code

>> Stack Overflow to solve integration problems

>> Original Source

Follow our official account for the latest HMS Core-related news and updates.

--

--

Manoj Kumar
Manoj Kumar

No responses yet