ComfyUI_ExtraModels icon indicating copy to clipboard operation
ComfyUI_ExtraModels copied to clipboard

Is it possible to use Flan-T5

Open ericbrook opened this issue 1 year ago • 2 comments

Would it be possible to use encoder only version of Flan-T5 with Pixart-Sigma? This one: https://huggingface.co/Kijai/flan-t5-xl-encoder-only-bf16/tree/main

ericbrook avatar Apr 15 '24 10:04 ericbrook

Hmm, well Flan T5 is different from the T5 v1.1 version that PixArt alpha/sigma uses I'm pretty sure. Not to mention the XL vs XXL part.

The current T5 linked from the deepfloyd repo is already encoder-only so that doesn't save us any space. BF16 would bring it down to ~10GBs and can (I think?) run on CPU at that precision as opposite to FP16, but I'm not sure how the loss of precision would change the model outputs. I can test it sometime if you want.

city96 avatar Apr 16 '24 01:04 city96

The reason I ask is that the Flan-T5 version I linked is only 2.6GB, which is a lot easier to handle than the ~20GB of the current T5v1.1. But I have no idea how hard/easy it would be to implement. Flan-T5 is used with ELLA, btw.

ericbrook avatar Apr 16 '24 10:04 ericbrook

any updates on this?

nighting0le01 avatar Jun 10 '24 05:06 nighting0le01

I can add code to load it but there's no point without a model that actually expects embeddings from that text model.

It's like trying to use clip_l (from sd1.5) for stable cascade which expects clip_g. You'd just get an error about the shape mismatch.

city96 avatar Jun 10 '24 23:06 city96

I can add code to load it but there's no point without a model that actually expects embeddings from that text model.

It's like trying to use clip_l (from sd1.5) for stable cascade which expects clip_g. You'd just get an error about the shape mismatch.

Hi @city96 there are models which use / expect t5-flan xxl:https://huggingface.co/google/flan-t5-xxl, can you share where all i would have to make changes in order to use this model? are you loading the current T5 from HF?

nighting0le01 avatar Jun 14 '24 16:06 nighting0le01

@nighting0le01 The current code uses transformers, but I am planning on switching over to the comfy implementation of T5 as part of a bigger rewrite. Nothing gets loaded from huggingface, all the files are included (e.g. the tokenizer) by default.

You could probably try and pass a path to a tokenizer here and then just load the model in folder mode with the T5v11 node (assuming it has the config and is placed in the models/t5 folder). Then the t5 text encode node should give you embeddings in the proper format.

You may have issues if that specific T5 expends different behavior for padding or start/stop tokens.

city96 avatar Jun 14 '24 20:06 city96