sd-scripts
sd-scripts copied to clipboard
(sd3 branch Flux LoRA Training) RuntimeError: "index_select_cuda" not implemented for 'Float8_e4m3fn'
In issue #1453 I made a PR #1452 to fix AttributeError: 'T5EncoderModel' object has no attribute 'text_model', while loading T5 into GPU without 'cache_text_encoder_outputs'.
But I did't check T5EncoderModel dict.😅
So it has bugs while load T5 with different dtype(ex. Using FP8 without 'cache_text_encoder_outputs'.)
T5EncoderModel Dict:
T5EncoderModel(
(shared): Embedding(32128, 4096)
(encoder): T5Stack(
(embed_tokens): Embedding(32128, 4096)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=4096, out_features=4096, bias=False)
(k): Linear(in_features=4096, out_features=4096, bias=False)
(v): Linear(in_features=4096, out_features=4096, bias=False)
(o): Linear(in_features=4096, out_features=4096, bias=False)
(relative_attention_bias): Embedding(32, 64)
)
(layer_norm): FusedRMSNorm(torch.Size([4096]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=4096, out_features=10240, bias=False)
(wi_1): Linear(in_features=4096, out_features=10240, bias=False)
(wo): Linear(in_features=10240, out_features=4096, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([4096]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-23): 23 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=4096, out_features=4096, bias=False)
(k): Linear(in_features=4096, out_features=4096, bias=False)
(v): Linear(in_features=4096, out_features=4096, bias=False)
(o): Linear(in_features=4096, out_features=4096, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([4096]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=4096, out_features=10240, bias=False)
(wi_1): Linear(in_features=4096, out_features=10240, bias=False)
(wo): Linear(in_features=10240, out_features=4096, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([4096]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([4096]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
T5EncoderModel have 3 items
shared,
encoder.embed_tokens,
encoder.block[0].layer[0].SelfAttention.relative_attention_bias
have attribute Embedding
So I fixed it in PR #1508 Second commit.
With fp8_base if t5xxl load in GPU use float8_e4m3fn, will cause loss nan,
So I change it to float8_e5m2.
@kohya-ss Please check PR #1508, its first commit also fix an issue metioned in #1509. Thanks.
We will support the T5XXL to run on fp8 in the future, so please wait a little longer.
I tried in Linux and Windows and I cannot train DB in Kohya as I get this error. No, I did not try and train Clip-L, nor T5, YET this error persists.