JC1DA comments

Results 28 comments of


                                            JC1DA

[Frontend][Core] Add Guidance backend for guided decoding

Thanks @njhill for your quick review. Really appreciate it. > * Presumably the parallelization speedup is due to the fact that the pytorch ops involved release the gil? That's one...

[Frontend][Core] Add Guidance backend for guided decoding

I also figured out lm-format-enforcer is not thread-safe. It failed some tests when number of threads is larger than 1. @njhill any suggestions for this?

[Frontend][Core] Add Guidance backend for guided decoding

> I also figured out lm-format-enforcer is not thread-safe. It failed some tests when number of threads is larger than 1. @njhill any suggestions for this? Decided to rollback to...

[Frontend][Core] Add Guidance backend for guided decoding

Resolved conflict with newly merged xgrammar

[Frontend][Core] Add Guidance backend for guided decoding

> Resolved conflict with newly merged xgrammar @njhill @mgoin

GPU memory requirement for inference

it requires more than 40GB for 2 seconds of 720p video in my early experiments, 3 seconds video needs ~71 GB Vram without upscaling (upscale = 1) Another question is...

[issue] C++ runtime support multimodal model llava-one-vision

Hi @lfr-0531 , are there any updates for multimodel CPP Runtime support?

[Bug]: Continuous batching (OpenAI Server) with greedy search return different results

I mitigated the issue by upgrading pytorch to 2.6 and setting pytorch to deterministic mode (since 2.6, pytorch has supported deterministic cumsum ops). https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html Not getting 100% similar results but...