llama2-webui Supports accepting network requests, listening on specific ports and running GPTQ models on multiple GPUs

Supports accepting network requests, listening on specific ports and running GPTQ models on multiple GPUs

Open Arondight opened this issue 11 months ago • 0 comments

If multiple GPUs are used to run the GPTQ model, memory would only be allocated on the first GPU, resulting in an error due to the inability to allocate more memory. This pr solves this problem. Also allow listening for network requests on specific ports, which is a necessary feature since the deployment environment is likely to not have a graphical interface.

Mar 22 '24 09:03 Arondight

llama2-webui llama2-webui copied to clipboard

Supports accepting network requests, listening on specific ports and running GPTQ models on multiple GPUs

llama2-webui
llama2-webui copied to clipboard