xla [XLA:GPU] Optimize command buffer update by only operating on commands that have pointer change

[XLA:GPU] Optimize command buffer update by only operating on commands that have pointer change

Open shawnwang18 opened this issue 9 months ago • 3 comments

May 13 '24 06:05 shawnwang18

Do you have any benchmarks for this change? I recall I was running synthetic benchmarks and graph update is so cheap that I decided it's not worth bothering, certainly not worth leaking implemntation details through so many abstraction levels. If it's indeed a significant performance improvement, then "something better" should be baked into se::CommandBuffer itself

May 14 '24 03:05 ezhulenev

Do you have any benchmarks for this change? I recall I was running synthetic benchmarks and graph update is so cheap that I decided it's not worth bothering, certainly not worth leaking implemntation details through so many abstraction levels. If it's indeed a significant performance improvement, then "something better" should be baked into se::CommandBuffer itself

I am still make this fix working, it has some broken. I feel like update cost is worth to triage on some larger models. I will test it how selectively update's benefits, and if so, we can found maybe better implementation.

May 14 '24 03:05 shawnwang18

Sounds good! If it's indeed expensive let's find a solultion.

FYI I remembered that I have microbenchmarks here: https://github.com/openxla/xla/blob/main/xla/stream_executor/gpu/gpu_command_buffer_test.cc#L1274-L1356 - you can add something more representative there, although it's part of SE so it can be only kernel launches (can't depend on XLA)

May 14 '24 03:05 ezhulenev

@shawnwang18 Please let us know when this PR is ready for review. Thanks!

May 16 '24 18:05 dimitar-asenov

xla xla copied to clipboard

[XLA:GPU] Optimize command buffer update by only operating on commands that have pointer change

xla
xla copied to clipboard