b2-sdk-python icon indicating copy to clipboard operation
b2-sdk-python copied to clipboard

EmergePlanner: Optimize _get_emerge_parts() for better grouping of parts in the upload buffer

Open LewyMaster opened this issue 11 months ago • 1 comments

EmergePlanner optimization solution as a task for job recruiting process.

Task: `implementation of https://github.com/Backblaze/b2-sdk-python/blob/10a5e720237cf2581a7f6297eda13f0e725a685b/b2sdk/transfer/emerge/planner/planner.py#L191 is a little bit naiive, because (with 5MB minimum part size) if we mark "u" as a megabyte of data to upload "c" as a megabyte of data to copy "d" as a megabyte of data that will be downloaded and uploaded then for "input", "master" pattern will be executed by the current version of the code while ideally in such case the "ideal" pattern would be used:

input: uuu cccccccccc uu ccccc ...

master: uuu ddcccccccc uu ddddd ...

ideal: uuu ddcccccddd uu ccccc ...

This is because Planner is only considering to bundle an upload with a part of a copy from the right side of the upload and not also from the left. Appropriately to the quality of the code in this project, tests should be provided and the code of the planner should remain readable. The behavior of the planner should strictly improve and the changes done to it should not have any negative impact on any input case (beyond marginally higher CPU consumption during planning). `

LewyMaster avatar Mar 26 '25 05:03 LewyMaster