LaaZa
LaaZa
Added support for offloading and multiple devices. Uses --gpu-memory and --cpu-memory and added --autogptq-device-map to use these features: [Accelerate: device_map](https://huggingface.co/docs/accelerate/main/en/usage_guides/big_modeling#designing-a-device-map) setting memory will use 'auto' unless something else is specified....
Thank you for your reports. Only `--wbits` should matter whether it knows to load a GPTQ model. `--model_type` has no effect on AutoGPTQ, it is automatically detected and completely ignored...
> Just tried loading one of my models which doesn't use `safetensor`: https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g/tree/main > > And I get: `FileNotFoundError: No quantized model found for TheBloke_vicuna-7B-1.1-GPTQ-4bit-128g` > > Folder structure: >...
Now checking for quantize_config.json and if it exists wbits does not need to be manually set. UI is not updated. I want some input on how this should be done,...
> Created a PR to update AutoGPTQ to provide optimizations. > This is enabled automatically if act-order and groupsize are not used at the same time. > https://github.com/PanQiWei/AutoGPTQ/tree/faster-cuda-no-actorder Nice. But...
> An automatic check seems very hard to implement. Probably just have to do best effort trying to check the filename then. Do you have any ideas about the issue...
> 1.Given the way AutoGPTQ currently loads models, it's not a good idea to check whether or not by the name of a file. > 2.It seems that the model...
> > @qwopqwop200 I think a separate fork would go against the idea of implementing AutoGPTQ as an universal solution. Would it be possible to implement those optimisations for the...
> Hi @LaaZa > > I was just reviewing the quantize_config.json code > > In these lines: https://github.com/LaaZa/text-generation-webui/blob/b173274c63c7c133e6a71986051017e5cc9ff918/modules/AutoGPTQ_loader.py#L75-L80 > > If the user already has `quantize_config.json`, this code will not...
The api does not use FastAPI.