MathGLM icon indicating copy to clipboard operation
MathGLM copied to clipboard

Using sat framework inference mathglm-10B to report an error

Open coldwater2000 opened this issue 2 years ago • 3 comments

def main(args):
    model, model_args = AutoModel.from_pretrained('/data/model/MathGLM-10B', args)
    model = model.eval()

Error: No such file or directory: '/data/model/MathGLM-10B/1/mp_rank_00_model_states.pt' ...but there is no such file and directory in the model library you provided.

Thanks.

args:Namespace(num_layers=48, hidden_size=2560, num_attention_heads=40, vocab_size=100, max_sequence_length=512, layernorm_order='pre', inner_hidden_size=None, hidden_size_per_attention_head=None, model_parallel_size=1, skip_init=True, use_gpu_initialization=True, num_multi_query_heads=0, layernorm_epsilon=1e-05, hidden_dropout=0.1, attention_dropout=0.1, drop_path=0.0, make_vocab_size_divisible_by=128, experiment_name='MyModel', train_iters=10000, batch_size=1, lr=0.0001, mode='inference', seed=1234, zero_stage=0, checkpoint_activations=False, checkpoint_num_layers=1, checkpoint_skip_layers=0, fp16=True, bf16=False, gradient_accumulation_steps=1, epochs=None, log_interval=50, summary_dir='', save_args=False, lr_decay_iters=None, lr_decay_style='linear', lr_decay_ratio=0.1, warmup=0.01, weight_decay=0.01, save=None, load=None, save_interval=5000, no_save_rng=False, no_load_rng=False, resume_dataloader=False, distributed_backend='nccl', local_rank=0, exit_interval=None, eval_batch_size=None, eval_iters=100, eval_interval=None, strict_eval=False, train_data=None, train_data_weights=None, iterable_dataset=False, valid_data=None, test_data=None, split='1000,1,1', num_workers=1, block_size=10000, prefetch_factor=4, tokenizer_type='fake', temperature=0.1, top_p=0.0, top_k=200, num_beams=1, length_penalty=0.0, no_repeat_ngram_size=0, min_tgt_length=0, out_seq_length=256, input_source='./input_test.txt', output_path='samples_result', with_id=False, max_inference_batch_size=8, device=0, deepspeed=False, deepspeed_config=None, deepscale=False, deepscale_config=None, deepspeed_mpi=False, cuda=True, rank=0, world_size=1, master_ip='localhost', master_port='43565', do_train=False

log:

[2023-09-23 11:42:52,093] [INFO] [RANK 0] building GLMModel model ... [2023-09-23 11:42:53,382] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 9879633920 [2023-09-23 11:42:53,407] [INFO] [RANK 0] global rank 0 is loading checkpoint /data/model/MathGLM-10B/1/mp_rank_00_model_states.pt Traceback (most recent call last): File "/data/MathGLM/test.py", line 38, in main(args) File "/data/MathGLM/test.py", line 9, in main model, model_args = AutoModel.from_pretrained('/data/model/MathGLM-10B', args) File "/root/miniconda3/envs/mathglm/lib/python3.9/site-packages/sat/model/base_model.py", line 310, in from_pretrained return cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=build_only, overwrite_args=overwrite_args, **kwargs) File "/root/miniconda3/envs/mathglm/lib/python3.9/site-packages/sat/model/base_model.py", line 304, in from_pretrained_base load_checkpoint(model, args, load_path=model_path, prefix=prefix) File "/root/miniconda3/envs/mathglm/lib/python3.9/site-packages/sat/training/model_io.py", line 222, in load_checkpoint sd = torch.load(checkpoint_name, map_location='cpu') File "/root/miniconda3/envs/mathglm/lib/python3.9/site-packages/torch/serialization.py", line 791, in load with _open_file_like(f, 'rb') as opened_file: File "/root/miniconda3/envs/mathglm/lib/python3.9/site-packages/torch/serialization.py", line 271, in _open_file_like return _open_file(name_or_buffer, mode) File "/root/miniconda3/envs/mathglm/lib/python3.9/site-packages/torch/serialization.py", line 252, in init super().init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: '/data/model/MathGLM-10B/1/mp_rank_00_model_states.pt'

coldwater2000 avatar Sep 23 '23 11:09 coldwater2000

Traceback (most recent call last): File "/home/llm/mathGlm/MathGLM-main/MathGLM_MWP/inference_mathglm.py", line 124, in main(args) File "/home/llm/mathGlm/MathGLM-main/MathGLM_MWP/inference_mathglm.py", line 38, in main model, args = CachedAutoregressiveModel.from_pretrained(model_path,args) File "/mnt/disk1/anaconda3/envs/mathglm/lib/python3.9/site-packages/sat/model/base_model.py", line 216, in from_pretrained return cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=build_only, overwrite_args=overwrite_args, **kwargs) File "/mnt/disk1/anaconda3/envs/mathglm/lib/python3.9/site-packages/sat/model/base_model.py", line 209, in from_pretrained_base load_checkpoint(model, args, load_path=model_path, prefix=prefix) File "/mnt/disk1/anaconda3/envs/mathglm/lib/python3.9/site-packages/sat/training/model_io.py", line 223, in load_checkpoint sd = torch.load(checkpoint_name, map_location='cpu') File "/mnt/disk1/anaconda3/envs/mathglm/lib/python3.9/site-packages/torch/serialization.py", line 791, in load with _open_file_like(f, 'rb') as opened_file: File "/mnt/disk1/anaconda3/envs/mathglm/lib/python3.9/site-packages/torch/serialization.py", line 271, in _open_file_like return _open_file(name_or_buffer, mode) File "/mnt/disk1/anaconda3/envs/mathglm/lib/python3.9/site-packages/torch/serialization.py", line 252, in init super().init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: '/home/llm/mathGlm/model/1/mp_rank_00_model_states.pt'

zhouyonglong avatar Oct 02 '23 07:10 zhouyonglong

load模型的时候加一个参数,build_only=True,可以解决你提到的这个问题 model, args = CachedAutoregressiveModel.from_pretrained(model_path,args,build_only=True)

但是又会出现新的问题

^@[2023-10-03 09:08:14,117] [INFO] [RANK 0] Cannot find THUDM/chatglm2-6b from Huggingface or sat. Creating a fake tokenizer... Traceback (most recent call last): File "/home/llm/mathGlm/MathGLM-main/MathGLM_MWP/inference_mathglm.py", line 124, in main(args) File "/home/llm/mathGlm/MathGLM-main/MathGLM_MWP/inference_mathglm.py", line 49, in main end_tokens = [tokenizer.get_command('eos').Id] AttributeError: 'FakeTokenizer' object has no attribute 'get_command'

zhouyonglong avatar Oct 03 '23 01:10 zhouyonglong

load模型的时候加一个参数,build_only=True,可以解决你提到的这个问题 model, args = CachedAutoregressiveModel.from_pretrained(model_path,args,build_only=True)

但是又会出现新的问题

^@[2023-10-03 09:08:14,117] [INFO] [RANK 0] Cannot find THUDM/chatglm2-6b from Huggingface or sat. Creating a fake tokenizer... Traceback (most recent call last): File "/home/llm/mathGlm/MathGLM-main/MathGLM_MWP/inference_mathglm.py", line 124, in main(args) File "/home/llm/mathGlm/MathGLM-main/MathGLM_MWP/inference_mathglm.py", line 49, in main end_tokens = [tokenizer.get_command('eos').Id] AttributeError: 'FakeTokenizer' object has no attribute 'get_command'

请问有解决方法吗?我运行到这里也提示分词器错误 end_tokens = [tokenizer.get_command('eos').Id] AttributeError: 'FakeTokenizer' object has no attribute 'get_command'

kaibush avatar Dec 27 '23 08:12 kaibush