LaaZa comments

Results 113 comments of


                                            LaaZa

Implement support for AutoGPTQ for loading GPTQ quantized models.

Added support for offloading and multiple devices. Uses --gpu-memory and --cpu-memory and added --autogptq-device-map to use these features: [Accelerate: device_map](https://huggingface.co/docs/accelerate/main/en/usage_guides/big_modeling#designing-a-device-map) setting memory will use 'auto' unless something else is specified....

Implement support for AutoGPTQ for loading GPTQ quantized models.

Thank you for your reports. Only `--wbits` should matter whether it knows to load a GPTQ model. `--model_type` has no effect on AutoGPTQ, it is automatically detected and completely ignored...

Implement support for AutoGPTQ for loading GPTQ quantized models.

> Just tried loading one of my models which doesn't use `safetensor`: https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g/tree/main > > And I get: `FileNotFoundError: No quantized model found for TheBloke_vicuna-7B-1.1-GPTQ-4bit-128g` > > Folder structure: >...

Implement support for AutoGPTQ for loading GPTQ quantized models.

Now checking for quantize_config.json and if it exists wbits does not need to be manually set. UI is not updated. I want some input on how this should be done,...

Implement support for AutoGPTQ for loading GPTQ quantized models.

> Created a PR to update AutoGPTQ to provide optimizations. > This is enabled automatically if act-order and groupsize are not used at the same time. > https://github.com/PanQiWei/AutoGPTQ/tree/faster-cuda-no-actorder Nice. But...

Implement support for AutoGPTQ for loading GPTQ quantized models.

> An automatic check seems very hard to implement. Probably just have to do best effort trying to check the filename then. Do you have any ideas about the issue...

Implement support for AutoGPTQ for loading GPTQ quantized models.

> 1.Given the way AutoGPTQ currently loads models, it's not a good idea to check whether or not by the name of a file. > 2.It seems that the model...

Implement support for AutoGPTQ for loading GPTQ quantized models.

> > @qwopqwop200 I think a separate fork would go against the idea of implementing AutoGPTQ as an universal solution. Would it be possible to implement those optimisations for the...

Implement support for AutoGPTQ for loading GPTQ quantized models.

> Hi @LaaZa > > I was just reviewing the quantize_config.json code > > In these lines: https://github.com/LaaZa/text-generation-webui/blob/b173274c63c7c133e6a71986051017e5cc9ff918/modules/AutoGPTQ_loader.py#L75-L80 > > If the user already has `quantize_config.json`, this code will not...

FastAPI docs support (http://127.0.0.1:7861/docs)

The api does not use FastAPI.