ZHANGHENGYUAN658

Results 5 issues of ZHANGHENGYUAN658

你好👋我目前在做长文本的相关研究,想要咨询下yunchang有没有可以适配Megatron-LM的代码?

Exception type: ValueError Detail: Traceback (most recent call last): File "/checkpoint/binary/train_package/utils/train.py", line 424, in LongRecipe_train.train_with_stage() File "/checkpoint/binary/train_package/utils/train.py", line 360, in train_with_stage model, accelerator = self.train(stage, model, accelerator, train_data_loader, loss_func, optim,...

您好!我在64卡上外推72b模型时一直遇到OOM的问题,是不是multi_node.yaml中配置错了? multi_node.yaml `debug: false deepspeed_config: deepspeed_config_file: utils/accelerate_configs/zero3_offload.json deepspeed_multinode_launcher: standard zero3_init_flag: false distributed_type: DEEPSPEED downcast_bf16: 'no' num_processes: 128 num_machines: 128 main_training_function: main rdzv_backend: c10d same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo:...

你好!在外推qwen2至128k时,seq_len是否需要设置为32768?我发现代码中llama3设置为24000

你好,第二步训练时的replay_dataset的地址有么