Juarez Bochi issues

Results 5 issues of


                                            Juarez Bochi

Use COPY instead of ADD for gu-wrapper.sh

According to Docker's best practices, `COPY` is [preferred](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#add-or-copy) [Dockle](https://github.com/goodwithtech/dockle) also reports this as a potential vulnerability: ``` FATAL - CIS-DI-0009: Use COPY instead of ADD in Dockerfile * Use COPY...

OCA Required

GGUF support

## Proposed changes This adds [GGUF](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) support using the excellent [gguflib](https://github.com/antirez/gguf-tools/blob/main/gguflib.h) from @antirez. Would there be interest in this? GGUF is currently very popular for local inference, and there are...

Example reading directly from gguf file

This loads all weights, config, and vocab directly from a GGUF file using https://github.com/ml-explore/mlx/pull/350 Example run: ```bash $ python llama.py models/tiny_llama/model.gguf [INFO] Loading model from models/tiny_llama/model.gguf. Press enter to start...

ONNX files for T5 model with text2text-generation-with-past task do not work

### System Info ```shell Reproduced on Mac, Python 3.11 and Google Colab / Python 3.10 optimum==1.14.0 ``` ### Who can help? @ michaelbenayoun ### Information - [ ] The official...

bug

Add IQ2 tensor types

[These](https://github.com/ggerganov/llama.cpp/blob/7dcbe39d36b76389f6c5cd3b151928472b7e22ff/ggml.h#L354-L355) were added in https://github.com/ggerganov/llama.cpp/pull/4773 It's annoying that I8 used to be 16 and it's now 18. I16 and I32 also changed. [Dequantization code is very cryptic](https://github.com/ggerganov/llama.cpp/blob/9ecdd12e95aee20d6dfaf5f5a0f0ce5ac1fb2747/ggml-quants.c#L3457-L3508). I would love...