Shiroha-Key comments

Results 9 comments of


                                            Shiroha-Key

SD1.5的UNet2DConditionModel 额外增加输入模块后使用nexfort接口进行编译报错

我想问问你加上"inductor.optimize_linear_epilogue": False,后编译时间有没有变长很多？ @xiangcp 我的话原本20分钟编完加上以后一个小时都没出来

about Nexfort compile cache

请问nexfort的动态尺寸dynamic还有在支持吗？我测了一下还是会重编译。 Should have no compilation for these new input shape The nexfort backend encounters an exception when dynamically switching resolution to 960x720.

Slow loading of model and very delay in image to video

i used 3090 on defulat cli_demo it is taking 12 minutes for 6 second Video ![image](https://github.com/user-attachments/assets/14061c5a-65f6-4b3a-82bd-1ca26bf106b3) used very few VRAM，Is this the correct speed? @zRzRzRzRzRzRzR

[Bug] Support to compile CogVideoXPipeline

I met completely same problem : 'T5EncoderModel' object has no attribute '_deployable_module_dpl_graph' same Additionally Torch Dynamo metrics ``` PyTorch version: 2.4.1+cu121 Is debug build: False CUDA used to build PyTorch:...

[Bug] Support to compile CogVideoXPipeline

i noted there two line and add pipe.to("cuda") ``` # self.pipe.enable_model_cpu_offload() # self.pipe.enable_sequential_cpu_offload() ``` the problem disappear @loretoparisi

[Bug] Support to compile CogVideoXPipeline

i tryed use your 'save_pipe(pipe, dir="cached_pipe", overwrite=True)' the problem has appeared again xd i suggest just dont use save_pipe and load_pipe for now.

about torch.compile

嗯嗯我还想问一下关于int8 1. int8量化需不需要源码安装 torch、torchao、diffusers 和 accelerate Python 包。 2. 理论上来说量化会降低显存并提升推理速度，为什么手册写的是推理速度大幅降低。https://github.com/THUDM/CogVideo/blob/main/README_zh.md 3. 我的GPU只有24gb vram 现在开启enable_sequential_cpu_offload()才能将最大显存使用控制在22GB左右，也就是不能开启torch.compile以及torchao的优化，但我i2v一个6s的视频需要12min 请问compile不能用的情况下还有什么方案可以优化一些速度吗，int8真的不会帮忙加速吗如果你能回答就太好了！！ @zRzRzRzRzRzRzR