JianWang

Results 4 comments of JianWang

Perhaps I can answer your question. The ‘trg’ will be used as the input of decoder, and the decoder will predict the next word of known information. The 'gold' will...

I have the same question about training details. And I want to know how many epochs the original CL mathods(moco v2/simsiam/barlow twins) was trained? is it 800 epochs?

I came across your issue. It performs well in linear probe, but poorly in full model fine-tuning.