Max Braun issues

Results 30 issues of


Max Braun

Convert input image to RGB

This adds support for input images with different encodings. The classification version already does this: https://github.com/google-coral/tflite/blob/master/python/examples/classification/classify_image.py#L103

Google Coral support

The [pre-trained Whisper models](https://github.com/openai/whisper#available-models-and-languages) don't work out-of-the-box with the [Google Coral Edge TPU](https://coral.ai/products/). They would need to meet [certain requirements](https://coral.ai/docs/edgetpu/models-intro/#model-requirements) so they can be converted to [TensorFlow Lite](https://www.tensorflow.org/lite), quantized to...

Improve performance

Some ideas from [section 4.5 of the paper](https://cdn.openai.com/papers/whisper.pdf) and [this discussion](https://github.com/openai/whisper/discussions/117#discussioncomment-3727051): - [ ] Use shorter chunks to reduce latency. (Try overlapping chunks by setting `options.prefix` to the transcription of...

Support Python 3.8 with PyTorch and CUDA on Jetson Nano

Official [JetPack](https://developer.nvidia.com/embedded/jetpack) for Jetson Nano support ends at version [4.6.3](https://developer.nvidia.com/jetpack-sdk-463), which is on Python 3.6. It would prevent some [inelegant workarounds](https://github.com/maxbbraun/whisper-edge#hack) if Python 3.8 was supported with PyTorch and CUDA....

Try larger models 💪

The current implementation works with the 15M parameter version of [`tinyllamas`](https://huggingface.co/karpathy/tinyllamas/tree/main). Just dropping in the next larger one (42M) flashes fine, but freezes at runtime. Would need to look into...

enhancement

good first issue

Measure power consumption 🔌

Once nice thing about microcontrollers is their low power consumption. We should measure it! While running inference and while idle/suspended. I assume it'll consume less power when driven with 3.3V...

documentation

Investigate audio output 🔈

See if there is a way to run text to speech and read the generated text out loud. Ideally, this happens in parallel to the LLM generating tokens (using the...

enhancement

Investigate audio input 🎤

See if there is a way to use the built-in microphone to recognize speech for prompting the LLM (possibly from a very limited vocabulary). This could even make use of...

enhancement

Optimize inference speed ⚡️

Experimenting with compiler options in branch [`fast-opts`](https://github.com/maxbbraun/llama4micro/compare/main...fast-opts). Switching from `-Os` to `-O3` seems to have significant impact on tokens per second. (`-Ofast` doesn't noticeably add on top.) ```diff ->>> Averaged...

enhancement

good first issue

Try llama.cpp/ggml

The main reason I chose [`karpathy/llama2.c`](https://github.com/karpathy/llama2.c) over [`ggerganov/llama.cpp`](https://github.com/ggerganov/llama.cpp) initially was that the former comes out of the box with very small (15M) models. `llama.cpp` and [`ggml`](https://github.com/ggerganov/ggml) more generally is a...

enhancement