Staszek Paśko comments

Results 10 comments of


                                            Staszek Paśko

[Bug]: Mistral 3.1 Small Image inference is broken on 0.8.4

The flame graph shows something somewhat similar #16432 is / was addressing, but this is on the receive path (gpu worker->main). @DarkLight1337 was that on main ? That might be...

[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased]

@ywang96 I've added a benchmark table to the linked bug #16185 My benchmark focused on memory performance rather than throughput, and only used a single model. It should not really...

[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased]

I have some experimental data with this PR in place. Interestingly it performs *much* better with zero-copy *disabled* In this new benchmark,` I am feeding gradually increasing document sets to...

[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased]

Regarding the table, yes, it's sorted weird, but the contents are correct. What matters most is the last row, anyways. Overall I think this is some problem with either zmq...

[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased]

> Thanks @p88h. I have a few questions... > > * This strange per-message usage is happening on the client/encode side? The memory usage is on the sender / encode...

[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased]

Spent some more time investigating this - it seems that no combination of zmq setup affects the problem. However, I found the more specific trigger : It's not just that...

[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased]

Benchmarks of the current approach: ``` config / prompts x images | 8 x 4 | 4 x 8 | 2 x 16 | 1 x 32 | ------------------------------+--------------+-------------+--------------+---------------+ zero-copy...

[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased]

Re: zmq & memoryview I've extracted the communication pattern into a separate test ... and that does *not* trigger the issue -- I tried making the comm pattern as close...

[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased]

Current results. Switched to async processing in the benchmark to improve throughput a bit, so had to re-run this for consistency (also, the asdict() removal helps the in-line case as...

[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased]

Thank you ! I was about to go back to debugging this morning ;)