Rudolf A. Braun
Rudolf A. Braun
@TParcollet I think the current approach is best for the following reasons. About the frequent logging: There are several relevant metrics (loss, accuracy, diversity loss at the minimum) that one...
Ah I didn't know about the vim/cat difference, cheers. I like your point about an epoch_percentage! Not sure how to make it more human-readable though.. I could make it so...
No problem I can do that!
Awesome! :) let's get it finished
Btw I think we could actually just use the scheduler from #1537 as it seems to me it's equivalent. @TParcollet
Ah nevermind you're right, for some reason I thought it was 3 step.
Removing `\n` and `:` fixes the predictions. I can do the PR. The way I would do it is by removing all newline and punctuation characters (as well as numbers...
I have the same issue.
Thank you for the additional info @cedrickchee ! I don't have the time to wait until someone manages to get libtorch to work on android though, so I guess the...
@TParcollet Noting two things down here to look at in the future for better performance: 1. No weight decay on biases and layernorm parameters 2. Finetuning on our model just...