Joshua Almonte
Results
1
comments of
Joshua Almonte
Hi 595, I believe the reason lies in the fact that ESM C was trained using a BERT-like transformer architecture. In BERT-like models, a beginning of sequence token `[cls]` is...