Jiaxing Qi (齐家兴)

Results 6 comments of Jiaxing Qi (齐家兴)

Here is the training loss of **baseline**, i.e. amp training from scratch ![image](https://user-images.githubusercontent.com/20978999/147623254-58c0861a-9e40-4a4a-a7c7-997f1e62a148.png) Here is the training loss of **solution 1**. ![image](https://user-images.githubusercontent.com/20978999/147622890-76020875-06c1-44e7-bfdb-b9740ce663a4.png) Here is the training loss of **solution 2**...

This is due to the `/tmp` folder inside your container does not have enough space. Because NeMo will untar the `.nemo` file into that folder, for 70B model, it needs...

By default, there is no `/workspace/result` folder inside NeMo container. Can you try give an existing dir to `exp_manager.explicit_log_dir`

> Also how can I adapt Tiny Shakespeare dataset? SFT normally requires data to be in style of . But the dataset you mentioned is not this type. Maybe you...

The converter `convert_llama_hf_to_nemo.py` should produce a `.nemo` file, not a dir. Can you try using NeMo docker image? https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

Add `WANDB_MODE=disabled` before torchrun