heli issues

Results 7 issues of


                                            heli

Can pycuda.driver.memcpy_htod_async add a "size" parameter?

Hello, now "memcpy_htod_async" function only support paramter: `pycuda.driver.memcpy_htod_async(dest, src, stream=None)`. Can you extends this api with additional paramter: `size`? Here `size` means how many bytes will be copyed. Or is...

聊天页面太窄建议给选项调整宽度

### What operating system are you using? windows ### What browser are you using? dege ### Describe the bug 聊天页面太窄建议给选项调整宽度 ### What prompt did you enter? _No response_ ### Console...

Increase the metrics's verbose logs level to 2?

**Is your feature request related to a problem? Please describe.** * Normally, we would like to set log verbose=1 for printing the request logs to stdout, like the following image:...

Add debug mode support for python backend

This commit introduces a debug mode for the Triton Python backend. When the environment variable `TRITON_DEBUG` is set to "1", the backend will import the `debugpy` module and start listening...

How is the `localityLbSetting` of DR works in envoy config?

### Is this the right place to submit this? - [X] This is not a security vulnerability or a crashing bug - [X] This is not a question about how...

feature/Multi-cluster

What's the difference when starting tritonserver with `mpirun --allow-run-as-root -n 1 /opt/tritonserver/bin/tritonserver` vs. `/opt/tritonserver/bin/tritonserver` directly?

**Description** I am observing a difference in the behavior of TritonServer when starting it with `mpirun` compared to starting it directly. Specifically, when I use `mpirun --allow-run-as-root -n 1 /opt/tritonserver/bin/tritonserver`,...

Does time-slicing or MPS GPU-sharing supports a mode for processe to exclusively use GPU DRAM?

* Currently, with time-slicing or MPS GPU-sharing technology, multiple processes simultaneously occupy GPU memory, preventing a single process from utilizing all the memory. Is there any technology or configuration that...