bpftime icon indicating copy to clipboard operation
bpftime copied to clipboard

[FEATURE]High performance for ebpf helper function implementation on the gpu side.

Open YRXING opened this issue 4 months ago • 2 comments

Is your feature request related to a problem? Please describe.

It seems the current implementation of the ebpf helper function on the GPU side has some performance issues. I have read the source code of this part. The essence of the helper functions is that the GPU reads the host side memory based on the request and response structures in CommSharedMem. Although cudaHostRegisterMapped is used for shared memory mapping, the GPU does not actually read the CPU's memory directly. Data is still copied through the underlying PCIe channel, but the driver layer helps us do this implicitly. Therefore, this causes a great performance loss, and the same conclusion is drawn from my test results.https://github.com/eunomia-bpf/bpftime/issues/411 Describe the solution you'd like

In the GPU scenario, can the map data be stored in the GPU global memory which is suitable for storing data that is frequently accessed by the GPU and then copied to the host side when the host actually needs to read it? Describe alternatives you've considered

Provide usage examples

Additional context

If this direction is feasible, I would be happy to invest some research and development in this issue.

YRXING avatar Aug 05 '25 09:08 YRXING

Yes, I think so! We are working to fix it in our next version, this is a high priority.

yunwei37 avatar Sep 30 '25 19:09 yunwei37

We already have some GPU specific maps and helpers, like in

https://github.com/eunomia-bpf/bpftime/tree/master/runtime/src/bpf_map/gpu

This is per GPU thread, no lock and store in the GPU memory.

You can also find some GPU specific helpers in https://github.com/eunomia-bpf/bpftime/tree/master/example/gpu

Any other suggestions?

yunwei37 avatar Oct 01 '25 22:10 yunwei37