Awni Hannun
Awni Hannun
Thanks a ton! I will check it shortly!
> For renaming the modules so that keys match, how would you suggest handling cases where the Transformers BERT model has more modular/nested modules? e.g. separate BERTIntermediate and BERTOutput layers?...
Great! We also need a readme. I can help with that just let me know you're plan / when I should review.
Hey! I will take a look shortly (next 1-2 days), sorry for the delay!
@andersonbcdefg sorry for the delay. I rebased this and ran the formatting. I'm doing a little work on it now. Just curious, what were the results you were getting? For...
Cool, what about F32, it's about 1% worse than the torch version. Did you see the same? I can spend a little time investigating, but I also want to make...
I think it helped, now I see: ``` {'Banking77Classification': 0.8325974025974027, 'STS12': 0.7584972019004673} ``` The STS12 is still a bit worse than the torch model.. but the banking classification is better..
@andersonbcdefg could you comment a bit on what this example adds beyond the original MLX Bert example? Is it mostly the MTEB evaluation? If so, maybe the right call is...
@andersonbcdefg sorry I got kind of stuck on this myself w.r.t. to how it should integrate with our BERT example. Maybe the answer is that it shouldn't..but then it doesn't...
Awesome can't wait!