gpt-fast
gpt-fast copied to clipboard
AttributeError: torch._inductor.config.fx_graph_cache does not exist
Quantize the model to int8 and it gave this error:
ubuntu@ip-172-31-19-240:~/gpt-fast$ python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int8 Loading model ... /opt/conda/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() Quantizing model weights for int8 weight-only symmetric per-channel quantization Writing quantized weights to checkpoints/openlm-research/open_llama_7b/model_int8.pth Quantization complete took 24.35 seconds
ubuntu@ip-172-31-19-240:~/gpt-fast$ python generate.py --compile --checkpoint_path checkpoints/$MODEL_REPO/model_int8.pth
Traceback (most recent call last):
File "/home/ubuntu/gpt-fast/generate.py", line 18, in
System:
Welcome to Ubuntu 20.04.6 LTS (GNU/Linux 5.15.0-1049-aws x86_64v)
- Please note that Amazon EC2 P2 Instance is not supported on current DLAMI.
- Supported EC2 instances: P5, P4d, P4de, P3, P3dn, G5, G4dn, G3.
- To activate pre-built pytorch environment, run: 'source activate pytorch'
- To activate base conda environment upon login, run: 'conda config --set auto_activate_base true'
- NVIDIA driver version: 535.104.12
- CUDA version: 12.1
Did you run with pytorch nightly?