How a Programmer Developed a Text Reader App for His 80-Year-Old Grandpa
“John, have you seen my glasses?”
Our old friend John, a programmer at Huawei, has a grandpa who despite his old age, is an avid reader. Leaning back, struggling to make out what was written on the newspaper through his glasses, but unable to take his eyes off the text — this was how my grandpa used to read, John explained.
Reading this way was harmful on his grandpa’s vision, and it occurred to John that the ears could take over the role of “reading” from the eyes. He soon developed a text-reading app that followed this logic, recognizing and then reading out text from a picture. Thanks to this app, John’s grandpa now can ”read” from the comfort of his rocking chair, without having to strain his eyes.
How to Implement
- The user takes a picture of a text passage. The app then automatically identifies the location of the text within the picture, and adjusts the shooting angle to an angle directly facing the text.
- The app recognizes and extracts the text from the picture.
- The app converts the recognized text into audio output by leveraging text-to-speech technology.
These functions are easy to implement, when relying on three services in HUAWEI ML Kit: document skew correction, text recognition, and text to speech (TTS).
Preparations
- Configure the Huawei Maven repository address.
- Add the build dependencies for the HMS Core SDK.
Tap PREVIOUS or NEXT to turn to the previous or next page. Tap speak to start reading; tap it again to pause reading.
Development process
- Create a TTS engine by using the custom configuration class MLTtsConfig. Here, on-device TTS is used as an example.
2. Create a TTS callback function for processing the TTS result.
3. Extract text from a PDF file.
4. Perform TTS in on-device mode.
5. Release resources when the current UI is destroyed.
Other Applicable Scenarios
TTS can be used across a broad range of scenarios. For example, you could integrate it into an education app to read bedtime stories to children, or integrate it into a navigation app, which could read out instructions aloud.
For more details, you can go to:
l Reddit to join our developer discussion
l GitHub to download demos and sample codes
l Stack Overflow to solve any integration problems