How a Programmer Used 300 Lines of Code to Help His Grandma Shop Online with Voice Input

“John, why the writing pad is missing again?”

John, programmer at Huawei, has a grandma who loves novelty, and lately she’s been obsessed with online shopping. Familiarizing herself with major shopping apps and their functions proved to be a piece of cake, and she had thought that her online shopping experience would be effortless — unfortunately, however, she was hindered by product searching.

John’s grandma tended to use handwriting input. When using it, she would often make mistakes, like switching to another input method she found unfamiliar, or tapping on undesired characters or signs.

Except for shopping apps, most mobile apps feature interface designs that are oriented to younger users — it’s no wonder that elderly users often struggle to figure out how to use them.

John patiently helped his grandma search for products with handwriting input several times. But then, he decided to use his skills as a veteran coder to give his grandma the best possible online shopping experience. More specifically, instead of helping her adjust to the available input method, he was determined to create an input method that would conform to her usage habits.

Since his grandma tended to err during manual input, John developed an input method that converts speech into text. Grandma was enthusiastic about the new method, because it is remarkably easy to use. All she has to do is to tap on the recording button and say the product’s name. The input method then recognizes what she has said, and converts her speech into text.

Actual Effects

Real-time speech recognition and speech to text are ideal for a broad range of apps, including:

  1. Game apps (online): Real-time speech recognition comes to users’ aid when they team up with others. It frees up users’ hands for controlling the action, sparing them from having to type to communicate with their partners. It can also free users from any potential embarrassment related to voice chatting during gaming.
  2. Work apps: Speech to text can play a vital role during long conferences, where typing to keep meeting minutes can be tedious and inefficient, with key details being missed. Using speech to text is much more efficient: during a conference, users can use this service to convert audio content into text; after the conference, they can simply retouch the text to make it more logical.
  3. Learning apps: Speech to text can offer users an enhanced learning experience. Without the service, users often have to pause audio materials to take notes, resulting in a fragmented learning process. With speech to text, users can concentrate on listening intently to the material while it is being played, and rely on the service to convert the audio content into text. They can then review the text after finishing the entire course, to ensure that they’ve mastered the content.

How to Implement

Two services in HUAWEI ML Kit: automatic speech recognition (ASR) and audio file transcription, make it easy to implement the above functions.

ASR can recognize speech of up to 60s, and convert the input speech into text in real time, with recognition accuracy of over 95%. It currently supports Mandarin Chinese (including Chinese-English bilingual speech), English, French, German, Spanish, Italian, and Arabic.

l Real-time result output

l Available options: with and without speech pickup UI

l Endpoint detection: Start and end points can be accurately located.

l Silence detection: No voice packet is sent for silent portions.

l Intelligent conversion to digital formats: For example, the year 2021 is recognized from voice input.

Audio file transcription can convert an audio file of up to five hours into text with punctuation, and automatically segment the text for greater clarity. In addition, this service can generate text with timestamps, facilitating further function development. In this version, both Chinese and English are supported.

Development Procedures

1. Preparations

(1) Configure the Huawei Maven repository address, and put the agconnect-services.json file under the app directory.

Open the build.gradle file in the root directory of your Android Studio project.

Add the AppGallery Connect plugin and the Maven repository.

Go to allprojects > repositories and configure the Maven repository address for the HMS Core SDK.

Go to buildscript > repositories and configure the Maven repository address for the HMS Core SDK.

l If the agconnect-services.json file has been added to the app, go to buildscript > dependencies and add the AppGallery Connect plugin configuration.

(2) Add the build dependencies for the HMS Core SDK.

(3) Configure the signing certificate in the build.gradle file under the app directory.

(4) Add permissions in the AndroidManifest.xml file.

2. Integrating the ASR Service

(1) Dynamically apply for the permissions.

(2) Create an Intent to set parameters.

(3) Override the onActivityResult method to process the result returned by ASR.

3. Integrating the Audio File Transcription Service

(1) Dynamically apply for the permissions.

(2) Create and initialize an audio transcription engine, and create an audio file transcription configurator.

(3) Create a listener callback to process the audio file transcription result.

l Transcription of short audio files with a duration of 1 minute or shorter:

l Transcription of audio files with a duration longer than 1 minute:

(4) Obtain an audio file and upload it to the audio transcription engine.

For more details, you can go to:

l Reddit to join our developer discussion

l GitHub to download demos and sample codes

l Stack Overflow to solve any integration problems

Android Developer