airllm
airllm copied to clipboard
configure the chunk split size
Mac M1 Max 32GB user here without ability to bitsandbites quantize
Is there a way configure the chunk size for the inference to be quicker ? I think the 32GB memory is not efficiently used.