Is it possible to use Flan-T5
Would it be possible to use encoder only version of Flan-T5 with Pixart-Sigma? This one: https://huggingface.co/Kijai/flan-t5-xl-encoder-only-bf16/tree/main
Hmm, well Flan T5 is different from the T5 v1.1 version that PixArt alpha/sigma uses I'm pretty sure. Not to mention the XL vs XXL part.
The current T5 linked from the deepfloyd repo is already encoder-only so that doesn't save us any space. BF16 would bring it down to ~10GBs and can (I think?) run on CPU at that precision as opposite to FP16, but I'm not sure how the loss of precision would change the model outputs. I can test it sometime if you want.
The reason I ask is that the Flan-T5 version I linked is only 2.6GB, which is a lot easier to handle than the ~20GB of the current T5v1.1. But I have no idea how hard/easy it would be to implement. Flan-T5 is used with ELLA, btw.
any updates on this?
I can add code to load it but there's no point without a model that actually expects embeddings from that text model.
It's like trying to use clip_l (from sd1.5) for stable cascade which expects clip_g. You'd just get an error about the shape mismatch.
I can add code to load it but there's no point without a model that actually expects embeddings from that text model.
It's like trying to use clip_l (from sd1.5) for stable cascade which expects clip_g. You'd just get an error about the shape mismatch.
Hi @city96 there are models which use / expect t5-flan xxl:https://huggingface.co/google/flan-t5-xxl, can you share where all i would have to make changes in order to use this model? are you loading the current T5 from HF?
@nighting0le01 The current code uses transformers, but I am planning on switching over to the comfy implementation of T5 as part of a bigger rewrite. Nothing gets loaded from huggingface, all the files are included (e.g. the tokenizer) by default.
You could probably try and pass a path to a tokenizer here and then just load the model in folder mode with the T5v11 node (assuming it has the config and is placed in the models/t5 folder). Then the t5 text encode node should give you embeddings in the proper format.
You may have issues if that specific T5 expends different behavior for padding or start/stop tokens.