Implementing Real-Time Transcription in an Easy Way

Manoj Kumar
4 min readJul 22, 2021

Background

The real-time onscreen subtitle is a must-have function in an ordinary video app. However, developing such a function can prove costly for small- and medium-sized developers. And even when implemented, speech recognition is often prone to inaccuracy. Fortunately, there’s a better way — HUAWEI ML Kit, which is remarkably easy to integrate, and makes real-time transcription an absolute breeze!

Introduction to ML Kit

ML Kit allows your app to leverage Huawei’s longstanding machine learning prowess to apply cutting-edge artificial intelligence (AI) across a wide range of contexts. With Huawei’s expertise built in, ML Kit is able to provide a broad array of easy-to-use machine learning capabilities, which serve as the building blocks for tomorrow’s cutting-edge AI apps. ML Kit capabilities include those related to:

  • Text (including text recognition, document recognition, and ID card recognition)
  • Language/Voice (such as real-time/on-device translation, automatic speech recognition, and real-time transcription)
  • Image (such as image classification, object detection and tracking, and landmark recognition)
  • Face/Body (such as face detection, skeleton detection, liveness detection, and face verification)
  • Natural language processing (text embedding)
  • Custom model (including the on-device inference framework and model development tool)

Real-time transcription is required to implement the function mentioned above. Let’s take a look at how this works in practice:

Now let’s move on to how to integrate this service.

Integrating Real-Time Transcription

  1. Steps
  2. Registering as a Huawei developer on HUAWEI Developers
  3. Creating an app

Create an app in AppGallery Connect. For details, see Getting Started with Android.

We’ve provided some screenshots for your reference:

3.Enabling ML Kit

4.Integrating the HMS Core SDK

Add the AppGallery Connect configuration file by completing the steps below:

Download and copy the agconnect-service.json file to the app directory of your Android Studio project.

Call setApiKey during app initialization.

To learn more, go to Adding the AppGallery Connect Configuration File.

5.Configuring the maven repository address

Add build dependencies.

Import the real-time transcription SDK.

implementation 'com.huawei.hms:ml-computer-voice-realtimetranscription:2.2.0.300'

Add the AppGallery Connect plugin configuration.

Method 1: Add the following information under the declaration in the file header:

apply plugin: 'com.huawei.agconnect'

Method 2: Add the plugin configuration in the plugins block.

Please refer to Integrating the Real-Time Transcription SDK to learn more.

Setting the cloud authentication information

When using on-cloud services of ML Kit, you can set the API key or access token (recommended) in either of the following ways:

Access token

You can use the following API to initialize the access token when the app is started. The access token does not need to be set again once initialized.

MLApplication.getInstance().setAccessToken(“your access token”);

API key

You can use the following API to initialize the API key when the app is started. The API key does not need to be set again once initialized.

MLApplication.getInstance().setApiKey(“your ApiKey”);

For details, see Notes on Using Cloud Authentication Information.

Code Development

Create and configure a speech recognizer.

Create a speech recognition result listener callback.

The recognition result can be obtained from the listener callbacks, including onRecognizingResults. Design the UI content according to the obtained results. For example, display the text transcribed from the input speech.

Bind the speech recognizer.

mSpeechRecognizer.setRealTimeTranscriptionListener(new SpeechRecognitionListener());

Call startRecognizing to start speech recognition.

mSpeechRecognizer.startRecognizing(config);

Release resources after recognition is complete.

(Optional) Obtain the list of supported languages.

We’ve finished integration here, so let’s test it out on a simple screen.

Tap START RECORDING. The text recognized from the input speech will display in the lower portion of the screen.

We’ve now built a simple audio transcription function.

Eager to build a fancier UI, with stunning animations, and other effects? By all means, take your shot!

For reference:

Real-Time Transcription

Sample Code for ML Kit

To learn more, please visit:

>> HUAWEI Developers official website

>> Development Guide

>> Reddit to join developer discussions

>> GitHub or Gitee to download the demo and sample code

>> Stack Overflow to solve integration problems

>>Original Source

Follow our official account for the latest HMS Core-related news and updates.

--

--