fastLLaMa
fastLLaMa copied to clipboard
fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backend.
Is this project updated enough to use gguf files or the LLama-3 architecture? I see that the documentation examples use ggml via .bin files which I'm assuming was the previous...
I installed it following the README with: $ pip3 install git+https://github.com/PotatoSpudowski/fastLLaMa.git@main Installation went fine but when I try to use it: $ iPython3 Python 3.10.6 (main, Mar 10 2023, 10:55:28)...
fixes https://github.com/PotatoSpudowski/fastLLaMa/issues/84. Should make the user experience much better when acessing the webui from a smartphone or a vertical monitor.
Webui is almost unusable on mobile because of the side bar with the list of saves taking most of space, leaving only a tiny portion of the screen for the...
Llama.cpp somewhat recently added support of openCL acceleration, enabling hardware-acelleration on AMD GPUs. Could it be possible to do the same thing?
### Discussed in https://github.com/PotatoSpudowski/fastLLaMa/discussions/26 Originally posted by **McRush** March 27, 2023 Hello! I find this project really cool. FastLLama has separate functions for prompt ingesting and text generation unlike other...
For example, it still uses old syntax for `convert-pth-to-ggml.py`, and for `export-from-huggingface.py`. Also maybe it would be better to clarify that even when installing through pip, we still need to...
```sh $: python examples/python/example-alpaca.py Traceback (most recent call last): File "examples/python/example-alpaca.py", line 1, in from fastllama import Model ImportError: cannot import name 'Model' from 'fastllama' (unknown location) $: pip install...