Build Whisper on K1 to realize speech recognition

Whisper is a high-performance automatic speech recognition (ASR) model developed by OpenAI that can run on multiple platforms, including Mac OS, iOS, Android, Java, Linux, FreeBSD, WebAssembly, Windows, and Raspberry Pi. This article will guide you on how to build and use the Whisper project on various platforms.

Preparation

Before you begin, make sure your device or development board has the appropriate operating system installed and has network connectivity.

Step 1: Download the Whisper project

First, clone the Whisper project’s GitHub repository locally:

git clone https://github.com/ggerganov/whisper.cpp.git

Step 2: Download and select the appropriate model

Go to the Whisper project directory and download the model you need. For example, download the basic English model:

sh ./models/download-ggml-model.sh base.en

This will download the Whisper model converted to ggml format for use on your device.

Step 3: Build and run the example

In the project directory, compile the main example file:

make

Then, use the following command to perform speech recognition on the audio file:

./main -f samples/jfk.wav

Step 4: Run a quick demo

If you want to get started quickly, you can just run:

make base.en

This command will automatically compile the project and process the sample audio.

Step 5: Use advanced features

Whisper supports a variety of advanced features, including but not limited to:

GPU acceleration (NVIDIA, Apple Silicon)
Speech segmentation
Real-time audio transcription
Multi-language support
Output format customization (text, SRT subtitles, etc.)

You can use these features by modifying the parameters of the launch command. For example, to enable GPU acceleration and subtitle output:

./main -f samples/jfk.wav --output-srt

Step 6: Integrate into other applications

Whisper’s lightweight implementation and C-style API make it easy to integrate into other applications. You can refer to the examples in main.cpp and stream.cpp to learn how to implement Whisper in your application.

Conclusion

Congratulations, you are now able to deploy and use Whisper on multiple platforms. Whether running offline on mobile devices or processing large amounts of data on servers, Whisper provides powerful support. Explore more features and integrate Whisper into your project to achieve efficient speech recognition and processing.