cog-llama-template
cog-llama-template copied to clipboard
Tenzorize the weights
I followed the guide to tenzorize the weights to file llama_7b_fp16.tensors, however after I push the docker image to Replicate repo and run it, the predict.py is still looking for files pytorch_model.bin instead of llama_7b_fp16.tensors.
error message as below:
[2023-08-10 16:10:54,531] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-08-10 16:10:58,050] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) starting setup loading weights from llama_weights/llama-7b w/o tensorizer Traceback (most recent call last): File "/root/.pyenv/versions/3.10.12/lib/python3.10/site-packages/cog/server/worker.py", line 185, in _setup run_setup(self._predictor) File "/root/.pyenv/versions/3.10.12/lib/python3.10/site-packages/cog/predictor.py", line 98, in run_setup predictor.setup(weights=weights) File "/src/predict.py", line 67, in setup self.model = self.load_huggingface_model(weights, load_in_4bit=LOAD_IN_4BIT) File "/src/predict.py", line 117, in load_huggingface_model model = YieldingLlama.from_pretrained( File "/root/.pyenv/versions/3.10.12/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2478, in from_pretrained raise EnvironmentError( OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory llama_weights/llama-7b. ⅹ Model setup failed