Staszek Paśko
Staszek Paśko
The flame graph shows something somewhat similar #16432 is / was addressing, but this is on the receive path (gpu worker->main). @DarkLight1337 was that on main ? That might be...
@ywang96 I've added a benchmark table to the linked bug #16185 My benchmark focused on memory performance rather than throughput, and only used a single model. It should not really...
I have some experimental data with this PR in place. Interestingly it performs *much* better with zero-copy *disabled* In this new benchmark,` I am feeding gradually increasing document sets to...
Regarding the table, yes, it's sorted weird, but the contents are correct. What matters most is the last row, anyways. Overall I think this is some problem with either zmq...
> Thanks @p88h. I have a few questions... > > * This strange per-message usage is happening on the client/encode side? The memory usage is on the sender / encode...
Spent some more time investigating this - it seems that no combination of zmq setup affects the problem. However, I found the more specific trigger : It's not just that...
Benchmarks of the current approach: ``` config / prompts x images | 8 x 4 | 4 x 8 | 2 x 16 | 1 x 32 | ------------------------------+--------------+-------------+--------------+---------------+ zero-copy...
Re: zmq & memoryview I've extracted the communication pattern into a separate test ... and that does *not* trigger the issue -- I tried making the comm pattern as close...
Current results. Switched to async processing in the benchmark to improve throughput a bit, so had to re-run this for consistency (also, the asdict() removal helps the in-line case as...
Thank you ! I was about to go back to debugging this morning ;)