Distribute the work more evenly between faster and slower workers
Looking for a feature to distribute the work unequally to each worker. Often, I have one or two cores that are running slower than others, or get started up and running after others. Once some of the faster cores are done, they sit idle while I wait for the other cores to catch up. Often the faster cores will do its allotment in half the time.
Request: some way to distribute work such that the faster cores either get more work or help out the other thread. I know that dynamic allocation is difficult during the run, so perhaps processor binding, unequal allocation, or spawning more workers than processor cores, etc.
Spawning more workers does speed up things somewhat, but I don't know which core will pick up which work.
For me, I always deal with pandas dataframe, and before multi-core working, I am used to shuffle the dataframe first.
I like this idea too - in my case, I needed to list objects in every S3 bucket referenced in my dataframe. Some objects had lots of stuff, some were empty. Right now, I have 2 of my 14 threads still running 2 hours after the others finished up, because they happened to get more of the full buckets and are taking forever to list everything 1000 entry long page by 1000 entry page.
If the other threads had some awareness and would feed off the back-end of the busy threads' job queues that'd speed execution up a lot.
I realize that most apply functions shouldn't be quite so widely variable in their execution time row by row.