Abed
Abed
I can give it a shot, might take me few days until I'm able to dedicate time for it though, but I think this API should also support gpu inference...
@darth-veitcher @fblissjr I just pushed an image to run inference on GPU using GPTQ-for-llama, you can find the image[ here ](https://hub.docker.com/layers/1b5d/llm-api/0.0.1-gptq-llama-cuda/images/sha256-3dc1ed97b5c9e40c90210e8f7e802a8aaaf5d1ff686dd4c6316c9b69c076a47f?context=explore) I also added a section in the [README.md](https://github.com/1b5d/llm-api/blob/main/README.md#llama--alpaca-on-gpu---using-gptq-for-llama-beta) file,...
@darth-veitcher Thank you for testing this on such a short notice, I think this error is just a stupid typo in the code that I recently pushed, and I've just...
Maybe you already checked this, but in your compose file above you had `device_ids: ['1']` and in your config file there is `cuda_visible_devices: "0"`, did you try to set the...
`cuda_visible_devices` correlates to the `CUDA_VISIBLE_DEVICES` environment variable, which uses the GPU identifiers not the order in which they appear in, what you described above is the behaviour of the `device`...
I see, I probably misunderstood the requirements from nvidia to run in container. Now I think the solution might be in the answer https://stackoverflow.com/a/58432877 would you be able to install...
Please note that they also wrote on their [package page](https://github.com/huggingface/text-generation-inference/pkgs/container/text-generation-inference): > Note: To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with...
Hey @darth-veitcher I just pushed a new image `1b5d/llm-api:0.0.2-gptq-llama-cuda` using a different base - would you please give it a try
Thanks for the feedback, I've been looking how to add Lora support, would you mind sharing with me a snippet on how do you run your model? And what do...