Maya
Maya
> Do you have the compiled wheel for `quant_cuda-0.0.0-cp310-cp310-win_amd64` quant kernel? > > I'd love to test Pyg-6B-q4 capability but I absolutely despise installing MSVC build environment and it seems...
> It asked for a HF token (which I provided) and then it failed to quantize. `c4` dataset requires huggingface authorization, you can use `wikitext2` or `ptb` instead. I'm not...
https://github.com/oobabooga/text-generation-webui/pull/615 - new version, this PR is outdated for now
`setup()` is called when ui is ready and parameters from settings.json are parsed. Global scope statements executed before this happens. Another thing is rare case, but what if someone wants...
https://github.com/oobabooga/text-generation-webui/blob/34970ea3af8f88c501e58fef2fc5c489c8df2743/modules/GPTQ_loader.py#L100 There is hardcoded sequence length in `_load_quant`. Does it work with context sizes over 2048? MPT-Storywriter should support up to 65k contexts.
I don't want to run it, I want to quantize model. Convert to 4 bit.
It's issue with flexgen, there are hardcoded model names and parameters in there. You can manually edit `site-packages/flexgen/opt_config.py` to support nerys and other finetuned models.
As temporary workaround, I've found a solution of disabling max stack size for regex at the top of `src/unicode.cpp`: ```c #define _REGEX_MAX_STACK_COUNT 0 #include "unicode.h" #include "unicode-data.h" #include #include //...