cybertron icon indicating copy to clipboard operation
cybertron copied to clipboard

Adding support for Distilbert

Open codetreras opened this issue 2 years ago • 7 comments

Based on the Bert's code for language modeling and text encoding tasks, these changes add support for DistilBert architecture #7 .

codetreras avatar Jun 26 '23 13:06 codetreras

Thank you! What differs DistilBERT from BERT?

matteo-grella avatar Jun 26 '23 22:06 matteo-grella

You're welcome, the project is awesome. The main differences are the configuration and the layers' identifiers. Architecturally, DistilBert has no token type embeddings or pooler. Check this image, in blue the equivalent layers, in orange the dissimilar ones.

Screenshot 2023-06-27 at 11 11 03 AM

At the beginning I thought about including DistilBert as a "variation" of Bert, however it would increase considerably the complexity of the code, here redundancy is necessary to make maintenance easier, let me know your thoughts.

codetreras avatar Jun 27 '23 09:06 codetreras

@marco-nicola what do you think friend? I’ll go for it but a bit worried about code duplication for just a few differences.

matteo-grella avatar Jun 29 '23 19:06 matteo-grella

Preferably just use the DistilBERT config (extend code in BERT) so there's no need for duplicate code.

mooijtech avatar Jun 29 '23 20:06 mooijtech

Got it, in that case extending the converter/preprocessing.go and converter/mapper.go for BERT would be the proper way to manage the differences in layer identifiers, together with the configuration. Let me know what you think, I can modify the PR for you to check this approach.

codetreras avatar Jun 30 '23 08:06 codetreras

I'm looking into supporting flan-t5-* but so far I'm stuck since there are differences in the positional encoder (different weight key) so it currently fails when prompting due to some input being nil (it seems the second time round).

mooijtech avatar Jul 14 '23 13:07 mooijtech

@mooijtech I am in vacation with family so it is a bit difficult for me to follow up on this now. I'll back to you next week and we'll figure it out together how to proceed with flan-t5-*!

matteo-grella avatar Jul 14 '23 14:07 matteo-grella