Shiroha-Key

Results 9 comments of Shiroha-Key

我想问问你加上"inductor.optimize_linear_epilogue": False,后编译时间有没有变长很多? @xiangcp 我的话原本20分钟编完 加上以后一个小时都没出来

感谢 原来是dev264刚更新上的 我说环境变量怎么没有起作用:3

请问nexfort的动态尺寸dynamic还有在支持吗? 我测了一下 还是会重编译。 Should have no compilation for these new input shape The nexfort backend encounters an exception when dynamically switching resolution to 960x720.

i used 3090 on defulat cli_demo it is taking 12 minutes for 6 second Video ![image](https://github.com/user-attachments/assets/14061c5a-65f6-4b3a-82bd-1ca26bf106b3) used very few VRAM,Is this the correct speed? @zRzRzRzRzRzRzR

I met completely same problem : 'T5EncoderModel' object has no attribute '_deployable_module_dpl_graph' same Additionally Torch Dynamo metrics ``` PyTorch version: 2.4.1+cu121 Is debug build: False CUDA used to build PyTorch:...

i noted there two line and add pipe.to("cuda") ``` # self.pipe.enable_model_cpu_offload() # self.pipe.enable_sequential_cpu_offload() ``` the problem disappear @loretoparisi

i tryed use your 'save_pipe(pipe, dir="cached_pipe", overwrite=True)' the problem has appeared again xd i suggest just dont use save_pipe and load_pipe for now.

嗯嗯 我还想问一下关于int8 1. int8量化需不需要源码安装 torch、torchao、diffusers 和 accelerate Python 包。 2. 理论上来说量化会降低显存并提升推理速度,为什么手册写的是推理速度大幅降低。https://github.com/THUDM/CogVideo/blob/main/README_zh.md 3. 我的GPU只有24gb vram 现在开启enable_sequential_cpu_offload()才能将最大显存使用控制在22GB左右,也就是不能开启torch.compile以及torchao的优化,但我i2v一个6s的视频需要12min 请问compile不能用的情况下 还有什么方案可以优化一些速度吗,int8真的不会帮忙加速吗 如果你能回答就太好了!! @zRzRzRzRzRzRzR