Conghui Tan
Conghui Tan
Hi, Thanks for your suggestion. Well, I will be glad if my algorithms can be a part of scipy. But I'am afraid that they are not suitable. My methods are...
Do you know there is already a Python library (lighting)(http://contrib.scikit-learn.org/lightning/) which solves the same problems?
I met the same issue. "--disable-cuda-graph" works for me. However, adding this option greatly slows down the inference speed in low QPS setting.
The timeout error is because the inference is already stucked or crashed. Setting a longer timeout doesn't help here. I did some debug, and found the exact line of code...
Do you have a plan to fix this issue? we need batch API in our scenario. > Oh. Currently do not use batch in dpsk models. We find this problem....
Thanks, FrankLeeeee. I also noticed this issue. But maybe it is better use a UUUID stead of the custom_id as the request id? For example, if two batches are processing...
I try to run it on H20, but I encountered the following error when capturing cuda graph on the decoding nodes. Adding --disable-cuda-graph can fix it, but the decoding speed...