JC1DA
JC1DA
Thanks @njhill for your quick review. Really appreciate it. > * Presumably the parallelization speedup is due to the fact that the pytorch ops involved release the gil? That's one...
I also figured out lm-format-enforcer is not thread-safe. It failed some tests when number of threads is larger than 1. @njhill any suggestions for this?
> I also figured out lm-format-enforcer is not thread-safe. It failed some tests when number of threads is larger than 1. @njhill any suggestions for this? Decided to rollback to...
Resolved conflict with newly merged xgrammar
> Resolved conflict with newly merged xgrammar @njhill @mgoin
it requires more than 40GB for 2 seconds of 720p video in my early experiments, 3 seconds video needs ~71 GB Vram without upscaling (upscale = 1) Another question is...
Hi @lfr-0531 , are there any updates for multimodel CPP Runtime support?
I mitigated the issue by upgrading pytorch to 2.6 and setting pytorch to deterministic mode (since 2.6, pytorch has supported deterministic cumsum ops). https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html Not getting 100% similar results but...