gama icon indicating copy to clipboard operation
gama copied to clipboard

Backend

Open prabhant opened this issue 4 years ago • 4 comments

Replacing custom backend with dask distributed

prabhant avatar Oct 08 '20 13:10 prabhant

It looks like there are style warnings for some files. Did you install pre-commit? See this documentation to help you get started.

PGijsbers avatar Oct 09 '20 08:10 PGijsbers

I tried to look into this today. Unfortunately I did not get very far. I observe a number of warnings being reported by dask, most commonly:

distributed.worker - WARNING -  Compute Failed
Function:  evaluate_individual
args:      (<gama.genetic_programming.components.individual.Individual object at 0x0000021B53B9E430>)
kwargs:    {}
Exception: TimeoutException()

but (when testing) also variations of

distributed.scheduler - ERROR - Workers don't have promised key: [], evaluate_individual-148f7e8adaded337558582d6544dadd1
NoneType: None

distributed.client - WARNING - Couldn't gather 3 keys, rescheduling {'evaluate_individual-ad040a1c4d388915d6ef85532435447f': (), 'evaluate_individual-dc88f8d70fb4dc6880a3d7f81cb34163': (), 'evaluate_individual-148f7e8adaded337558582d6544dadd1': ()}

I tried to reproduce a minimal working example, but

from dask.distributed import Client, as_completed
import stopit


def stopit_work(base):
    with stopit.ThreadingTimeout(1) as c_mgr:
        do_compute = base
        while True:
            do_compute *= base
            do_compute /= base
    if not c_mgr:
        return "Stopped due to timeout"
    return "Done"


def main(func):
    with stopit.ThreadingTimeout(5) as c_mgr:
        with Client() as client:
            ac = as_completed(client.map(func, range(1, 10_000)))
            for future in ac:
                print(future.result())
    print("done")


if __name__ == '__main__':
    main(stopit_work)

seems to function as expected.

I also tried to modify evaluate_pipeline but no variation truly solved the issue. Finally I tried to re-integrate dask.distributed myself from develop (referencing your work and dask docs) for RandomSearch only, but still seemed to get the same warnings and errors present in the backend branch.

PGijsbers avatar Jan 07 '21 19:01 PGijsbers

Also my host machine seems to close connections [WinError 10054] An existing connection was forcibly closed by the remote host, not sure what is happening there. Also seems to occur randomly (when running the same script multiple times, it only shows up sometimes).

PGijsbers avatar Jan 07 '21 19:01 PGijsbers

Looks like the TimeoutException warnings don't occur if the timeout is always set to at least one second. So I'll do that for now. But I am unsure about the other random errors/warnings.

PGijsbers avatar Jan 07 '21 20:01 PGijsbers

Closing this PR as the #dask branch is much further along with the integration and addresses most issues raised in this PR. (Though that branch also still has problems, so will possibly never be merged >:| )

With your permission I will also remove the stale branch.

PGijsbers avatar Sep 14 '22 10:09 PGijsbers

Thanks for the effort and first exploration though, it did still help in creating the second iteration 👍

PGijsbers avatar Sep 14 '22 10:09 PGijsbers