Lukas Kreussel
Lukas Kreussel
@abetlen Alright i now have an action that builds me a `libllama.so` binary with and without `avx512`. But if i copy this binary into my docker container and set the...
@gjmulder Alright thats kind of my bad, the `lib` folder is created in a github action and contains the `llama.so` binary i added above. I now also added a `lib`...
This seams to be an error with the tokenizer you are using, i encountered similar issues and decided to refactor and combine many of the conversion scripts for our `rustformers\llm`...
The included GGML tokenizer is very lossy, in `rustformers\llm` we support HugginfaceTokenizers which aren't supported in the ggml implementation. Maybe give those a try, if that doesn't work maybe review...
What will the developer experience be like? Is some sort of "interactive" mode planned where i can see the results an operation will have on a tensor while stepping throught...
> Our execution happens on the GPU and we generate a single set of commands to execute all ops in one go, so breakpoints are difficult to support. Also intermediate...
Hey i finally got some time to take a look at these changes and i will start to play around a bit. Is there somewhere a list of all supported...
I converted a GPT-2 model to ONNX and visualized the operations [here](https://github.com/webonnx/wonnx/assets/65088241/676c145f-5c21-48d3-ab19-aec523fb628c). The main thing still missing are the different shape manipulating operations like "Squeeze, Unsqueez, Concat, Gather" and stuff...
> Forgot to mention two things that might be useful: > > * You can use the `nnx` (`wonnx-cli`) tool to list operations used in an ONNX model (it is...
Got it, my plan was to somehow get a GPT-2 F16 GGML model loaded and executed and then work my way toward some more exotic quantization formats, which will require...