distributed icon indicating copy to clipboard operation
distributed copied to clipboard

Avoid deepcopy when submitting graph

Open fjetter opened this issue 9 months ago • 3 comments

This avoids at least one deepcopy of the graph and therefore reduces overhead during submit

Typically, one would simply add ToPickle/Serialize to the dict value and pass the graph through directly. However, this would make it impossible to get clear error messages on (de-)serialization exceptions which is why this was pulled out to a manual call back then.

However, passing the header and frames as dictionary values directly causes msgpack to simply copy the bytes into the msgpack bytestream instead of us passing through the frames implicitly.

This avoids an unnecessary copy

fjetter avatar May 02 '24 07:05 fjetter

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    18 files   -      9     18 suites   - 9   14h 59m 31s :stopwatch: + 6h 2m 58s  1 939 tests  -  2 117  1 029 :white_check_mark:  -  2 914  130 :zzz: + 24    779 :x: +  776  1 :fire:  - 3  16 066 runs   - 28 144  8 013 :white_check_mark:  - 34 408  788 :zzz:  - 981  7 256 :x: +7 253  9 :fire:  - 8 

For more details on these failures and errors, see this check.

Results for commit a990563c. ± Comparison against base commit 4986fa41.

This pull request removes 2127 and adds 10 tests. Note that renamed tests count towards both.
distributed.cli.tests.test_dask_worker ‑ test_listen_address_ipv6[tcp://[::1]:---nanny]
distributed.cli.tests.test_dask_worker ‑ test_listen_address_ipv6[tcp://[::1]:---no-nanny]
distributed.diagnostics.tests.test_progress_widgets ‑ test_fast
distributed.diagnostics.tests.test_progress_widgets ‑ test_multibar_complete
distributed.diagnostics.tests.test_progress_widgets ‑ test_multibar_func_warns
distributed.diagnostics.tests.test_progress_widgets ‑ test_multibar_with_spans
distributed.diagnostics.tests.test_progress_widgets ‑ test_progressbar_cancel
distributed.diagnostics.tests.test_progress_widgets ‑ test_serializers
distributed.diagnostics.tests.test_progress_widgets ‑ test_tls
distributed.diagnostics.tests.test_progressbar ‑ test_TextProgressBar_empty
…
distributed.cli.tests.test_dask_worker.test_listen_address_ipv6[tcp:..[ ‑ 1]:---nanny]
distributed.cli.tests.test_dask_worker.test_listen_address_ipv6[tcp:..[ ‑ 1]:---no-nanny]
distributed.diagnostics.tests.test_memray ‑ test_basic_integration_scheduler
distributed.diagnostics.tests.test_memray ‑ test_basic_integration_scheduler_report_args[False]
distributed.diagnostics.tests.test_memray ‑ test_basic_integration_scheduler_report_args[report_args0]
distributed.diagnostics.tests.test_memray ‑ test_basic_integration_workers[1]
distributed.diagnostics.tests.test_memray ‑ test_basic_integration_workers[False]
distributed.diagnostics.tests.test_memray ‑ test_basic_integration_workers[True]
distributed.diagnostics.tests.test_memray ‑ test_basic_integration_workers_report_args[False]
distributed.diagnostics.tests.test_memray ‑ test_basic_integration_workers_report_args[report_args0]
This pull request removes 39 skipped tests and adds 2 skipped tests. Note that renamed tests count towards both.
distributed.cli.tests.test_dask_worker ‑ test_listen_address_ipv6[tcp://[::1]:---nanny]
distributed.cli.tests.test_dask_worker ‑ test_listen_address_ipv6[tcp://[::1]:---no-nanny]
distributed.protocol.tests.test_arrow
distributed.protocol.tests.test_collection
distributed.protocol.tests.test_collection_cuda ‑ test_serialize_cupy[50-cuda-dict]
distributed.protocol.tests.test_collection_cuda ‑ test_serialize_cupy[50-cuda-tuple]
distributed.protocol.tests.test_collection_cuda ‑ test_serialize_cupy[None-pickle-dict]
distributed.protocol.tests.test_collection_cuda ‑ test_serialize_cupy[None-pickle-tuple]
distributed.protocol.tests.test_collection_cuda ‑ test_serialize_pandas_pandas[None-pickle-dict]
distributed.protocol.tests.test_collection_cuda ‑ test_serialize_pandas_pandas[None-pickle-tuple]
…
distributed.cli.tests.test_dask_worker.test_listen_address_ipv6[tcp:..[ ‑ 1]:---nanny]
distributed.cli.tests.test_dask_worker.test_listen_address_ipv6[tcp:..[ ‑ 1]:---no-nanny]
This pull request skips 61 tests.
distributed.cli.tests.test_dask_scheduler ‑ test_dashboard_allowlist
distributed.cli.tests.test_dask_scheduler ‑ test_signal_handling[15]
distributed.cli.tests.test_dask_scheduler ‑ test_signal_handling[2]
distributed.cli.tests.test_dask_scheduler ‑ test_signal_handling[Signals.SIGINT]
distributed.cli.tests.test_dask_scheduler ‑ test_signal_handling[Signals.SIGTERM]
distributed.cli.tests.test_dask_scheduler ‑ test_single_executable_deprecated
distributed.cli.tests.test_dask_spec ‑ test_signal_handling_scheduler[15]
distributed.cli.tests.test_dask_spec ‑ test_signal_handling_scheduler[2]
distributed.cli.tests.test_dask_spec ‑ test_signal_handling_scheduler[Signals.SIGINT]
distributed.cli.tests.test_dask_spec ‑ test_signal_handling_scheduler[Signals.SIGTERM]
…

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar May 02 '24 09:05 github-actions[bot]

well, that's disappointing

image

I hope/suspect that's the Serialize. I'll try with sticking to ToPickle

fjetter avatar May 03 '24 15:05 fjetter

What's the status of this PR?

hendrikmakait avatar Jun 24 '24 15:06 hendrikmakait