cybertron icon indicating copy to clipboard operation
cybertron copied to clipboard

Support `google/flan-t5-*`

Open mooijtech opened this issue 2 years ago • 5 comments

I would like to use the following model(s): https://huggingface.co/google/flan-t5-small https://huggingface.co/google/flan-t5-xxl

What would be required to add support if I were to look at contributing myself?

Kind regards, Marten

mooijtech avatar Jun 23 '23 08:06 mooijtech

Not sure if T5 is compatible with BART, hopefully it is since both are encoder-decoder. Seems to be some config.json differences, trying to modify it now.

mooijtech avatar Jun 27 '23 15:06 mooijtech

Stuck on input encoding embeddings.

mooijtech avatar Jul 14 '23 11:07 mooijtech

T5 uses an encoder-decoder architecture that closely resembles the original transformer. The differences are:

    LayerNorm is applied immediately before each attention and feed forward transformation (i.e., outside of the residual path)

    No additive bias is used for LayerNorm (i.e., see here; we only use scale and eliminate the additive bias)

    A simple position embedding scheme is used that adds a scalar to the corresponding logit used to compute attention weights

    Dropout is applied throughout the network (e.g., attention weights, feed forward network, skip connection, etc.)

3f25f11e-1daf-4711-940a-6b09a1f62ae7_2298x1474

mooijtech avatar Jul 27 '23 14:07 mooijtech

@mooijtech I am ready to work on this together, let me know if you're still interested :)

matteo-grella avatar Nov 01 '23 21:11 matteo-grella

Hello Matteo,

I have lost access to my GitHub account due to the great new 2FA requirement (replying via email should still work I guess).

I am not currently interested as I've switched direction, if I really wanted to do this I would have done it by now as if I want to do something there's nothing that can stop me as the internet knows everything :)

Kind regards, Marten

mooijtech avatar Nov 02 '23 12:11 mooijtech