GLM-130B icon indicating copy to clipboard operation
GLM-130B copied to clipboard

ValueError: could not find the metadata file ckpt/glm-130b-sat/49300/latest, please check --load

Open bolongliu opened this issue 1 year ago • 4 comments

(glm-130b) ➜ GLM-130B git:(main) ✗ bash scripts/evaluate.sh tasks/bloom/glue_cola.yaml WARNING:torch.distributed.run:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


Please install apex to use fused_layer_norm, fall back to torch.nn.LayerNorm Please install apex to use fused_layer_norm, fall back to torch.nn.LayerNorm Please install apex to use fused_layer_norm, fall back to torch.nn.LayerNorm Please install apex to use FusedScaleMaskSoftmax, otherwise the inference efficiency will be greatly reduced Please install apex to use FusedScaleMaskSoftmax, otherwise the inference efficiency will be greatly reduced Please install apex to use fused_layer_norm, fall back to torch.nn.LayerNorm Please install apex to use FusedScaleMaskSoftmax, otherwise the inference efficiency will be greatly reduced Please install apex to use FusedScaleMaskSoftmax, otherwise the inference efficiency will be greatly reduced Please install apex to use fused_layer_norm, fall back to torch.nn.LayerNorm Please install apex to use fused_layer_norm, fall back to torch.nn.LayerNorm Please install apex to use FusedScaleMaskSoftmax, otherwise the inference efficiency will be greatly reduced Please install apex to use FusedScaleMaskSoftmax, otherwise the inference efficiency will be greatly reduced Please install apex to use fused_layer_norm, fall back to torch.nn.LayerNorm Please install apex to use fused_layer_norm, fall back to torch.nn.LayerNorm Please install apex to use FusedScaleMaskSoftmax, otherwise the inference efficiency will be greatly reduced Please install apex to use FusedScaleMaskSoftmax, otherwise the inference efficiency will be greatly reduced WARNING: No training data specified WARNING: No training data specified WARNING: No training data specified WARNING: No training data specified WARNING: No training data specified WARNING: No training data specified WARNING: No training data specified WARNING: No training data specified using world size: 8 and model-parallel size: 8

padded vocab (size: 150528) with 0 dummy tokens (new size: 150528) initializing model parallel with size 8 Loading task configs Task glue_cola loaded from config tasks/bloom/glue_cola.yaml Successfully load 1 task Set tokenizer as a icetk-glm-130B tokenizer! Now you can get_tokenizer() everywhere. Traceback (most recent call last): File "/home/llms/GLM-130B/evaluate.py", line 67, in main() File "/home/llms/GLM-130B/evaluate.py", line 58, in main model, tokenizer = initialize_model_and_tokenizer(args) File "/home/llms/GLM-130B/initialize.py", line 72, in initialize_model_and_tokenizer load_checkpoint(model, args) File "/home/miniconda3/envs/glm-130b/lib/python3.9/site-packages/SwissArmyTransformer/training/model_io.py", line 157, in load_checkpoint iteration, release, success = get_checkpoint_iteration(load_path) File "/home/miniconda3/envs/glm-130b/lib/python3.9/site-packages/SwissArmyTransformer/training/model_io.py", line 131, in get_checkpoint_iteration raise ValueError('could not find the metadata file {}, please check --load'.format( ValueError: could not find the metadata file /ckpt/glm-130b-sat/49300/latest, please check --load

bolongliu avatar May 25 '23 13:05 bolongliu

You should probably check your path for checkpoints

phoenix0110 avatar May 26 '23 01:05 phoenix0110

how you guys slove this ?

TheZhuangPark avatar May 29 '23 03:05 TheZhuangPark

(glm-130b) ➜ GLM-130B git:(main) ✗ bash scripts/evaluate.sh tasks/bloom/glue_cola.yaml WARNING:torch.distributed.run:

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

Please install apex to use fused_layer_norm, fall back to torch.nn.LayerNorm Please install apex to use fused_layer_norm, fall back to torch.nn.LayerNorm Please install apex to use fused_layer_norm, fall back to torch.nn.LayerNorm Please install apex to use FusedScaleMaskSoftmax, otherwise the inference efficiency will be greatly reduced Please install apex to use FusedScaleMaskSoftmax, otherwise the inference efficiency will be greatly reduced Please install apex to use fused_layer_norm, fall back to torch.nn.LayerNorm Please install apex to use FusedScaleMaskSoftmax, otherwise the inference efficiency will be greatly reduced Please install apex to use FusedScaleMaskSoftmax, otherwise the inference efficiency will be greatly reduced Please install apex to use fused_layer_norm, fall back to torch.nn.LayerNorm Please install apex to use fused_layer_norm, fall back to torch.nn.LayerNorm Please install apex to use FusedScaleMaskSoftmax, otherwise the inference efficiency will be greatly reduced Please install apex to use FusedScaleMaskSoftmax, otherwise the inference efficiency will be greatly reduced Please install apex to use fused_layer_norm, fall back to torch.nn.LayerNorm Please install apex to use fused_layer_norm, fall back to torch.nn.LayerNorm Please install apex to use FusedScaleMaskSoftmax, otherwise the inference efficiency will be greatly reduced Please install apex to use FusedScaleMaskSoftmax, otherwise the inference efficiency will be greatly reduced WARNING: No training data specified WARNING: No training data specified WARNING: No training data specified WARNING: No training data specified WARNING: No training data specified WARNING: No training data specified WARNING: No training data specified WARNING: No training data specified using world size: 8 and model-parallel size: 8

padded vocab (size: 150528) with 0 dummy tokens (new size: 150528) initializing model parallel with size 8 Loading task configs Task glue_cola loaded from config tasks/bloom/glue_cola.yaml Successfully load 1 task Set tokenizer as a icetk-glm-130B tokenizer! Now you can get_tokenizer() everywhere. Traceback (most recent call last): File "/home/llms/GLM-130B/evaluate.py", line 67, in main() File "/home/llms/GLM-130B/evaluate.py", line 58, in main model, tokenizer = initialize_model_and_tokenizer(args) File "/home/llms/GLM-130B/initialize.py", line 72, in initialize_model_and_tokenizer load_checkpoint(model, args) File "/home/miniconda3/envs/glm-130b/lib/python3.9/site-packages/SwissArmyTransformer/training/model_io.py", line 157, in load_checkpoint iteration, release, success = get_checkpoint_iteration(load_path) File "/home/miniconda3/envs/glm-130b/lib/python3.9/site-packages/SwissArmyTransformer/training/model_io.py", line 131, in get_checkpoint_iteration raise ValueError('could not find the metadata file {}, please check --load'.format( ValueError: could not find the metadata file /ckpt/glm-130b-sat/49300/latest, please check --load

checkpoint path = ckpt/glm-130b-sat/

GXKIM avatar Jun 02 '23 21:06 GXKIM

遇到同样的问题

yandan1234 avatar Aug 11 '23 12:08 yandan1234