Chong Chen

Results 5 comments of Chong Chen

My parameter `run_validation` is True, but the model file still cannot be output. My configuration is as follows: ``` @dataclass class train_config: model_name: str="PATH/to/LLAMA/7B" enable_fsdp: bool=False low_cpu_fsdp: bool=False run_validation: bool=True...

> Hi @BugmakerCC can you check your eval loss and post the log of your training run? We've seen the eval loss turning to Inf which prevents a checkpoint from...

> Yes, your eval loss is NaN so no checkpoint gets saved: > > ``` > evaluating Epoch: 100%|�[32m██████████�[0m| 100/100 [01:28 eval_ppl=tensor(nan, device='cuda:0') eval_epoch_loss=tensor(nan, device='cuda:0') > ``` > > Your...

> Can have many reasons. Are you using the original alpaca json or a modification? Did you figure out why some weights are not initialized? I am using the original...

> > > Hi @BugmakerCC can you check your eval loss and post the log of your training run? We've seen the eval loss turning to Inf which prevents a...