Questions about Hardware requirement
Excuse me, but when the model inference on 1 * RTX4090, running python cli_demo_sat.py --from_pretrained cogcom-base-17b --local_tokenizer tokenizer --english --quant 4, the output will be CUDA out of memory. I wonder if it needs more GPU, or I need to add some arguments? Thank you!
Excuse me, but when the model inference on 1 * RTX4090, running
python cli_demo_sat.py --from_pretrained cogcom-base-17b --local_tokenizer tokenizer --english --quant 4, the output will be CUDA out of memory. I wonder if it needs more GPU, or I need to add some arguments? Thank you!
Hi, thanks for your interest! I am currently trying to investigate this quantization problem.
Same here. Is there an approximate estimation for VRAM usage? (<20GB or ~24GB)