SwissArmyTransformer
SwissArmyTransformer copied to clipboard
SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
Is sat suuport saving checkpoint by using fp16 or bf16?
单机多卡训练正常,多机多卡报错 Skipping backward and optimizer step for nan or inf in forwarding metrics/loss!
deepspeed hostfile多机多卡分布式训练时出现以下问题: Traceback (most recent call last): worker0: File "finetune_XrayGLM.py", line 173, in worker0: args = get_args(args_list) worker0: File "/home/sfz/soft/miniconda3/envs/test/lib/python3.8/site-packages/sat/arguments.py", line 360, in get_args worker0: raise ValueError( worker0: ValueError: LOCAL_RANK...
在执行代码 model, model_args = CogVLMModel.from_pretrained( "cogvlm-chat", args=argparse.Namespace( deepspeed=None, local_rank=0, rank=0, world_size=1, model_parallel_size=1, mode='inference', skip_init=True, fp16=False, bf16=True, use_gpu_initialization=True, device='cuda', )) 时,下载中途报错: ore.exceptions.ResponseStreamingError: An error occurred while reading from response stream: ('Connection...
您好,我用pip安装了SwissArmyTransformer-0.4.5,但无法导入。已经确认安装在PYTHONPATH路径里了,换了好几台机器都是这样,不知道如何解决?多谢
I tried to use torch.compile but failed with SAT. The reason is that self.transformer.hooks.clear() in base_model.py also clear the hooks of torch.compile?