Rudolf A. Braun

Results 47 comments of Rudolf A. Braun

@TParcollet I think the current approach is best for the following reasons. About the frequent logging: There are several relevant metrics (loss, accuracy, diversity loss at the minimum) that one...

Ah I didn't know about the vim/cat difference, cheers. I like your point about an epoch_percentage! Not sure how to make it more human-readable though.. I could make it so...

Awesome! :) let's get it finished

Btw I think we could actually just use the scheduler from #1537 as it seems to me it's equivalent. @TParcollet

Ah nevermind you're right, for some reason I thought it was 3 step.

Removing `\n` and `:` fixes the predictions. I can do the PR. The way I would do it is by removing all newline and punctuation characters (as well as numbers...

Thank you for the additional info @cedrickchee ! I don't have the time to wait until someone manages to get libtorch to work on android though, so I guess the...

@TParcollet Noting two things down here to look at in the future for better performance: 1. No weight decay on biases and layernorm parameters 2. Finetuning on our model just...