Wei Zheng
Wei Zheng
some information: I ran it on google colab and things work fine. I ran it locally (no docker), visualization tool is blank. I rank it on different browsers, same problem....
Hi, Does anyone notice that the eval loss diverge? I had many runs and most of them diverges. In some cases, the overfitted checkpoint produces better response (i.e. dulcet-shape-11 below,...
> > > The reason why it generated "### instruction" is because your fine-tuning is inefficient. In this case, we put a eos_token_id=2 into the tensor for each instance before...
Hi @oleksost, I also saw the commit history on hf for the published model. The latest commit note was “actually masked loss”. That leads me to believe that the published...