ComfyUI
ComfyUI copied to clipboard
ClipTextEncoderFlux not separating keys out in a usable form
Expected Behavior
Using the Long Clip text encoder does not function as intended since the ClipTextEncoderFlux does not assign a key to clip_l so it can't be extracted, unlike with the SD3 Clip Text Encoder. The only key assigned is for t5xxl.
Actual Behavior
The output is lumped into one so the token length is combined. Can't run the Long Clip node with the combined core node for ClipTextEncode because it's mashing t5xxl prompt into clip_l to determine token length
Steps to Reproduce
Use node with Long Clip Text Encode (the updated version at https://github.com/zer0int/ComfyUI-Long-CLIP not the original, the updated one gets around some of the issues with conditionals that the original does not). It's also unclear to me based on the code in ComfyCore whether clip_l is ever truncated or not, though the Long Clip does seem to make a difference.
Debug Logs
!!! Exception during processing !!! expected sequence of length 413 at dim 1 (got 99999999)
Traceback (most recent call last):
File "E:\ComfyUI\execution.py", line 317, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\ComfyUI\execution.py", line 192, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\ComfyUI\execution.py", line 169, in _map_node_over_list
process_inputs(input_dict, i)
File "E:\ComfyUI\execution.py", line 158, in process_inputs
results.append(getattr(obj, func)(**inputs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\ComfyUI\comfy_extras\nodes_flux.py", line 21, in encode
output = clip.encode_from_tokens(tokens, return_pooled=True, return_dict=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\ComfyUI\comfy\sd.py", line 126, in encode_from_tokens
o = self.cond_stage_model.encode_token_weights(tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\ComfyUI\custom_nodes\ComfyUI-Long-CLIP\long_clip.py", line 402, in encode_token_weights
t5_out, t5_pooled = self.t5xxl.encode_token_weights(token_weight_pairs_t5)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\ComfyUI\comfy\sd1_clip.py", line 41, in encode_token_weights
o = self.encode(to_encode)
^^^^^^^^^^^^^^^^^^^^^^
File "E:\ComfyUI\comfy\sd1_clip.py", line 229, in encode
return self(tokens)
^^^^^^^^^^^^
File "C:\Users\qrkyx\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\qrkyx\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\ComfyUI\comfy\sd1_clip.py", line 185, in forward
tokens = torch.LongTensor(tokens).to(device)
^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: expected sequence of length 413 at dim 1 (got 99999999)
Other
No response
Additional info: The issue was due to padding depending on which prompt is shorter (CLIP-L vs. T5). And I learned from "searching for the source of 9999999" that you don't pad a T5: 'pad_to_max_length=False, max_length=99999999'. So now I don't pad, and it seems to have fixed the issue.
I'm curious why you set such a gigantic max_length for it, though? It sure helped to highlight the mistake by causing the system to freeze for a good 30 seconds while trying to crunch through "99999999" tokens, but I'm not sure that was intentional. :-)