ComfyUI ClipTextEncoderFlux not separating keys out in a usable form

ClipTextEncoderFlux not separating keys out in a usable form

Open afk4life opened this issue 1 year ago • 1 comments

Expected Behavior

Using the Long Clip text encoder does not function as intended since the ClipTextEncoderFlux does not assign a key to clip_l so it can't be extracted, unlike with the SD3 Clip Text Encoder. The only key assigned is for t5xxl.

Actual Behavior

The output is lumped into one so the token length is combined. Can't run the Long Clip node with the combined core node for ClipTextEncode because it's mashing t5xxl prompt into clip_l to determine token length

Steps to Reproduce

Use node with Long Clip Text Encode (the updated version at https://github.com/zer0int/ComfyUI-Long-CLIP not the original, the updated one gets around some of the issues with conditionals that the original does not). It's also unclear to me based on the code in ComfyCore whether clip_l is ever truncated or not, though the Long Clip does seem to make a difference.

Debug Logs

!!! Exception during processing !!! expected sequence of length 413 at dim 1 (got 99999999)
Traceback (most recent call last):
  File "E:\ComfyUI\execution.py", line 317, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\ComfyUI\execution.py", line 192, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\ComfyUI\execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "E:\ComfyUI\execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\ComfyUI\comfy_extras\nodes_flux.py", line 21, in encode
    output = clip.encode_from_tokens(tokens, return_pooled=True, return_dict=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\ComfyUI\comfy\sd.py", line 126, in encode_from_tokens
    o = self.cond_stage_model.encode_token_weights(tokens)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\ComfyUI\custom_nodes\ComfyUI-Long-CLIP\long_clip.py", line 402, in encode_token_weights
    t5_out, t5_pooled = self.t5xxl.encode_token_weights(token_weight_pairs_t5)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\ComfyUI\comfy\sd1_clip.py", line 41, in encode_token_weights
    o = self.encode(to_encode)
        ^^^^^^^^^^^^^^^^^^^^^^
  File "E:\ComfyUI\comfy\sd1_clip.py", line 229, in encode
    return self(tokens)
           ^^^^^^^^^^^^
  File "C:\Users\qrkyx\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\qrkyx\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\ComfyUI\comfy\sd1_clip.py", line 185, in forward
    tokens = torch.LongTensor(tokens).to(device)
             ^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: expected sequence of length 413 at dim 1 (got 99999999)

Other

No response

Aug 28 '24 13:08 afk4life

Additional info: The issue was due to padding depending on which prompt is shorter (CLIP-L vs. T5). And I learned from "searching for the source of 9999999" that you don't pad a T5: 'pad_to_max_length=False, max_length=99999999'. So now I don't pad, and it seems to have fixed the issue.

I'm curious why you set such a gigantic max_length for it, though? It sure helped to highlight the mistake by causing the system to freeze for a good 30 seconds while trying to crunch through "99999999" tokens, but I'm not sure that was intentional. :-)

Aug 28 '24 16:08 zer0int

ComfyUI ComfyUI copied to clipboard

ClipTextEncoderFlux not separating keys out in a usable form

Expected Behavior

Actual Behavior

Steps to Reproduce

Debug Logs

Other

ComfyUI
ComfyUI copied to clipboard