Abed

Results 19 comments of Abed

I can give it a shot, might take me few days until I'm able to dedicate time for it though, but I think this API should also support gpu inference...

@darth-veitcher @fblissjr I just pushed an image to run inference on GPU using GPTQ-for-llama, you can find the image[ here ](https://hub.docker.com/layers/1b5d/llm-api/0.0.1-gptq-llama-cuda/images/sha256-3dc1ed97b5c9e40c90210e8f7e802a8aaaf5d1ff686dd4c6316c9b69c076a47f?context=explore) I also added a section in the [README.md](https://github.com/1b5d/llm-api/blob/main/README.md#llama--alpaca-on-gpu---using-gptq-for-llama-beta) file,...

@darth-veitcher Thank you for testing this on such a short notice, I think this error is just a stupid typo in the code that I recently pushed, and I've just...

Maybe you already checked this, but in your compose file above you had `device_ids: ['1']` and in your config file there is `cuda_visible_devices: "0"`, did you try to set the...

`cuda_visible_devices` correlates to the `CUDA_VISIBLE_DEVICES` environment variable, which uses the GPU identifiers not the order in which they appear in, what you described above is the behaviour of the `device`...

I see, I probably misunderstood the requirements from nvidia to run in container. Now I think the solution might be in the answer https://stackoverflow.com/a/58432877 would you be able to install...

Please note that they also wrote on their [package page](https://github.com/huggingface/text-generation-inference/pkgs/container/text-generation-inference): > Note: To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with...

Hey @darth-veitcher I just pushed a new image `1b5d/llm-api:0.0.2-gptq-llama-cuda` using a different base - would you please give it a try

Thanks for the feedback, I've been looking how to add Lora support, would you mind sharing with me a snippet on how do you run your model? And what do...