ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

[BlockInfo] Index to Tensor

Open Haoming02 opened this issue 2 weeks ago • 8 comments

  • Convert the block_index from int to a torch.Tensor, to support torch.compile
    • dtype: torch.uint8
    • device: ~~input's device~~ cpu

Haoming02 avatar Dec 09 '25 03:12 Haoming02

@kijai could you check if this fixes the torch.compile graph breaks?

Kosinkadink avatar Dec 09 '25 03:12 Kosinkadink

Yes, tested on HunyuanVideo 1.5 and went from 54 recompiles on first step to none.

kijai avatar Dec 09 '25 09:12 kijai

Sweet, I'll merge it in after stable!

Kosinkadink avatar Dec 09 '25 22:12 Kosinkadink

@Haoming02 could you make the tensors be on the CPU instead?

Kosinkadink avatar Dec 11 '25 02:12 Kosinkadink

@kijai hey, could you retest this to make sure that the CPU tensors work fine with torch.compile?

Kosinkadink avatar Dec 11 '25 03:12 Kosinkadink

@kijai hey, could you retest this to make sure that the CPU tensors work fine with torch.compile?

While creating it on cpu is fine, actually using a cpu tensor is a bit problematic as with inductor it requires cpu compile support, which then requires more compiler libraries installed than at least the current Triton-windows package includes, and if you don't have them it just errors out.

You can of course cast it to gpu before using it, it does then create DeviceCopy in input program warning, but I'm unsure what this affects.

kijai avatar Dec 11 '25 09:12 kijai

damn, that is definitely annoying. the goal would be that the block index is used inside attention code and is only compared. To even do this comparison, would it be better to compare against the GPU tensor, or CPU tensor? if GPU tensor would be easier, then we can edit this PR to use GPU.

Alternatively, is there a way to tell torch compile to ignore transformer_options or at least that one key?

Kosinkadink avatar Dec 11 '25 21:12 Kosinkadink

I'm also waiting for blockinfo to be merged so I can implement things like RadialAttn more easily. I think it's ok to create a scalar tensor on GPU and do comparison with it. It's not the first time we do it, for example in https://github.com/comfyanonymous/ComfyUI/blob/c5a47a16924e1be96241553a1448b298e57e50a1/comfy/extra_samplers/uni_pc.py#L785

woct0rdho avatar Dec 13 '25 01:12 woct0rdho

Comfy merged a PR to make things in transformer_options not cause as many graph breaks, so we can stick with integers: https://github.com/comfyanonymous/ComfyUI/pull/11317

I'll be closing this issue, and once the tensors get turned into ints in the Lumina BlockInfo PR, i can merge that one! https://github.com/comfyanonymous/ComfyUI/pull/11227

Kosinkadink avatar Dec 16 '25 00:12 Kosinkadink