gama
gama copied to clipboard
Backend
Replacing custom backend with dask distributed
It looks like there are style warnings for some files. Did you install pre-commit
? See this documentation to help you get started.
I tried to look into this today. Unfortunately I did not get very far.
I observe a number of warnings being reported by dask
, most commonly:
distributed.worker - WARNING - Compute Failed
Function: evaluate_individual
args: (<gama.genetic_programming.components.individual.Individual object at 0x0000021B53B9E430>)
kwargs: {}
Exception: TimeoutException()
but (when testing) also variations of
distributed.scheduler - ERROR - Workers don't have promised key: [], evaluate_individual-148f7e8adaded337558582d6544dadd1
NoneType: None
distributed.client - WARNING - Couldn't gather 3 keys, rescheduling {'evaluate_individual-ad040a1c4d388915d6ef85532435447f': (), 'evaluate_individual-dc88f8d70fb4dc6880a3d7f81cb34163': (), 'evaluate_individual-148f7e8adaded337558582d6544dadd1': ()}
I tried to reproduce a minimal working example, but
from dask.distributed import Client, as_completed
import stopit
def stopit_work(base):
with stopit.ThreadingTimeout(1) as c_mgr:
do_compute = base
while True:
do_compute *= base
do_compute /= base
if not c_mgr:
return "Stopped due to timeout"
return "Done"
def main(func):
with stopit.ThreadingTimeout(5) as c_mgr:
with Client() as client:
ac = as_completed(client.map(func, range(1, 10_000)))
for future in ac:
print(future.result())
print("done")
if __name__ == '__main__':
main(stopit_work)
seems to function as expected.
I also tried to modify evaluate_pipeline
but no variation truly solved the issue.
Finally I tried to re-integrate dask.distributed
myself from develop
(referencing your work and dask docs) for RandomSearch
only, but still seemed to get the same warnings and errors present in the backend
branch.
Also my host machine seems to close connections [WinError 10054] An existing connection was forcibly closed by the remote host
, not sure what is happening there. Also seems to occur randomly (when running the same script multiple times, it only shows up sometimes).
Looks like the TimeoutException
warnings don't occur if the timeout is always set to at least one second. So I'll do that for now. But I am unsure about the other random errors/warnings.
Closing this PR as the #dask
branch is much further along with the integration and addresses most issues raised in this PR. (Though that branch also still has problems, so will possibly never be merged >:| )
With your permission I will also remove the stale branch.
Thanks for the effort and first exploration though, it did still help in creating the second iteration 👍