Questions about Hardware requirement

Open saki-37 opened this issue 1 year ago • 2 comments

Excuse me, but when the model inference on 1 * RTX4090, running python cli_demo_sat.py --from_pretrained cogcom-base-17b --local_tokenizer tokenizer --english --quant 4, the output will be CUDA out of memory. I wonder if it needs more GPU, or I need to add some arguments? Thank you!

Mar 18 '24 08:03 saki-37

Excuse me, but when the model inference on 1 * RTX4090, running python cli_demo_sat.py --from_pretrained cogcom-base-17b --local_tokenizer tokenizer --english --quant 4, the output will be CUDA out of memory. I wonder if it needs more GPU, or I need to add some arguments? Thank you!

Hi, thanks for your interest! I am currently trying to investigate this quantization problem.

Mar 20 '24 18:03 qijimrc

Same here. Is there an approximate estimation for VRAM usage? (<20GB or ~24GB)

Apr 01 '24 07:04 zilunzhang