Jiaxing Qi (齐家兴)
Jiaxing Qi (齐家兴)
Here is the training loss of **baseline**, i.e. amp training from scratch data:image/s3,"s3://crabby-images/8a09b/8a09b014cf819face0a63cd16edfc91ced61c37f" alt="image" Here is the training loss of **solution 1**. data:image/s3,"s3://crabby-images/cd70c/cd70cc93b2141fa4729cd9f37cffb5999c6f0419" alt="image" Here is the training loss of **solution 2**...
This is due to the `/tmp` folder inside your container does not have enough space. Because NeMo will untar the `.nemo` file into that folder, for 70B model, it needs...
By default, there is no `/workspace/result` folder inside NeMo container. Can you try give an existing dir to `exp_manager.explicit_log_dir`
> Also how can I adapt Tiny Shakespeare dataset? SFT normally requires data to be in style of . But the dataset you mentioned is not this type. Maybe you...
The converter `convert_llama_hf_to_nemo.py` should produce a `.nemo` file, not a dir. Can you try using NeMo docker image? https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
Add `WANDB_MODE=disabled` before torchrun