ml N, - max- len N maximum segment length in characters mc N, - max- context N maximum number of text context tokens to store d N, - duration N duration of audio to process in milliseconds on N, - offset- n N segment index offset ot N, - offset- t N time offset in milliseconds p N, - processors N number of processors to use during computation t N, - threads N number of threads to use during computation h, - help show this help message and exit I./ examples - O3 - std= c++ 11 - pthread examples/ main/ main. I./ examples - O3 - std= c++ 11 - pthread - c whisper. O3 - std= c11 - pthread - DGGML_USE_ACCELERATE - c ggml. Then, download one of the Whisper models converted in ggml format. The Accelerate framework utilizes the special-purpose AMX coprocessor available in modern Apple products. The latter are especially effective for bigger sizes since Intrinsics or CBLAS Accelerate framework routines are used. Depending on the computation size, Arm Neon SIMD The tensor operators are optimized heavily for Apple silicon CPUs. Various other examples are available in the examples folder.Sample real-time audio transcription from the microphone is demonstrated in stream.cpp.Sample usage is demonstrated in main.cpp.The transformer model and the high-level C-style API are implemented in C++ ( whisper.h / whisper.cpp).The core tensor operations are implemented in C ( ggml.h / ggml.c).Or you can even run it straight in the browser: talk.wasm Implementation details On Apple Silicon, the inference runs fully on the GPU via Metal: metal-base-1.mp4 You can also easily make your own offline voice assistant application: command command-0.mp4 Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications.Īs an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: whisper.objc whisper-iphone-13-mini-2.mp4 Transformer inference: whisper.h / whisper.cpp.The entire implementation of the model is contained in 2 source files: Partial GPU support for NVIDIA via cuBLAS.4-bit and 5-bit integer quantization support.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |