Serhii Korol
Serhii Korol
The problem on Mac is with the underlying download script (associative arrays and nested loop). It's Linux-oriented and should be adopted to MacOS. TBH, I quit trying to fix different...
Because it's not the root cause. It never enters this [loop](https://github.com/juncongmoo/pyllama/blob/main/llama/download_community.sh#L134-L139).
```shell_script python3 quant_infer.py --wbits 4 --load pyllama-7B4b.pt --text "The meaning of life is" --max_length 24 --cuda cuda:0 ```
Several people complaining on the garbage in the output here #58.
Noticed the same on 4 bits model. Just a garbage in the output. Now I'm trying to quantize from the downloaded files. Will post the result here later.
BTW, found an interesting observation here #58: `--groupsize 128` affect the results somehow. Need to try to quantize w/o this flag.
Yeah, seems like it works w/o `groupsize`.
@DrewSBAI this number would be different even for a single device if you re-run it 10 times. For me, slam-toolbox mostly every single run prints a new number in ~447-453...
Any updates? JB plugin still doesn't work: 
Nevermind, it's fixed in the dev branch. Just build it and install from .zip.