Quang-elec44 comments

Results 17 comments of


                                            Quang-elec44

trafficstars

[ONNX] Support huggingface BART to ONNX

@tianleiwu Thank you for your information. I manage to change my code in order not to add new inputs and I successfully exported my model. However, there are lots of...

beam search support

@jiguanglizipao I agree with you, it seems that the argument "best_of" does not provide good results. Moreover, in the case of my model, using "do_sample" leads to unwanted results

[Bug]: Using lm_format_enforcer, or using certain schemas, with Llama-3.2-90B-Vision-Instruct causes a crash

I got the same problem when running with batch size 64. The server crashed after running a few minutes (any backend failed). There is no problem running with the `vllm:0.6.2`....

components should run concurrently when not explicitly waiting on inputs

It seems that `haystack` does not support parallel execution. I spent time reading the document but currently, there is no solution. btw, @alex-stoica, could you tell me how to visualize...

components should run concurrently when not explicitly waiting on inputs

@alex-stoica Yeah, I read the tutorial but didn't find it useful. I think Haystack lacks dynamic/parallel graph execution, so the team needs to work more on this. Currently, I switch...

[Performance]: guided generation is very slow in offline mode

@stas00 In my experience, guided generation is always slower than normal. I recommend you try `sglang` instead. Sglang achieves better throughput than vLLM, but the guided generation is still slower.

[Performance]: guided generation is very slow in offline mode

@ > > @stas00 In my experience, guided generation is always slower than normal. I recommend you try `sglang` instead. Sglang achieves better throughput than vLLM, but the guided generation...