JC1DA

Results 28 comments of JC1DA

Thanks @njhill for your quick review. Really appreciate it. > * Presumably the parallelization speedup is due to the fact that the pytorch ops involved release the gil? That's one...

I also figured out lm-format-enforcer is not thread-safe. It failed some tests when number of threads is larger than 1. @njhill any suggestions for this?

> I also figured out lm-format-enforcer is not thread-safe. It failed some tests when number of threads is larger than 1. @njhill any suggestions for this? Decided to rollback to...

Resolved conflict with newly merged xgrammar

> Resolved conflict with newly merged xgrammar @njhill @mgoin

it requires more than 40GB for 2 seconds of 720p video in my early experiments, 3 seconds video needs ~71 GB Vram without upscaling (upscale = 1) Another question is...

Hi @lfr-0531 , are there any updates for multimodel CPP Runtime support?

I mitigated the issue by upgrading pytorch to 2.6 and setting pytorch to deterministic mode (since 2.6, pytorch has supported deterministic cumsum ops). https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html Not getting 100% similar results but...