seamless Nested transformation improvements

seamless.direct.transformer wraps a function inside a DirectTransformer object, which launches direct transformations (seamless.direct.Transformation) when called. In addition, direct transformations can also be created from unbound high-level Transformer objects (Transformer.get_transformation). Nested transformation is when direct transformations are created inside an existing transformation.

There are two kinds of nested transformation: local and non-local (delegated). By default, DirectTransformer objects have local=None, meaning that delegated nested transformation is tried first. Local nested transformation is then used as a fallback.

Local nested transformation already works. After the transformation a been launched in a forked seamless.core.execute.execute call, the forking modulates any subsequent call involving the seamless.direct.run machinery. Namely, seamless.direct.run will now forward local nested transformation calls to the parent process via a parent process queue. Some improvement may be needed, because currently, all calls are queued up until any call is waited for, causing all calls to be launched-and-waited-for only then.

Non-local nested transformation means that an assistant must be available inside the transformer. This doesn't work for any of the current assistants (micro, mini or mini-dask). This will be a bit complicated in cases where the assistant lives on a user machine whereas the job is executed on a cluster. Barring some kind of reverse tunneling or websockets, one solution for dask-based execution is to make a "in-process assistant" as a thin wrapper around the Dask scheduler (which is by necessity available for each worker). Add to the assistant protocol a "release lock"/"acquire lock" APl. For the Dask in-process assistant, theses will be simple wrappers around Client.secede() and Client.rejoin().

Sep 25 '23 11:09 sjdv1982

There is now an InProcessAssistant class.

Sep 25 '23 19:09 sjdv1982

Instead of communicating to the Dask scheduler, a worker could also try reach the Dask client inside the original assistant. In that case, use the same Dask mechanism as https://github.com/sjdv1982/seamless/issues/219, and probably store an ID that identifies the original assistant (since multiple assistants can connect to the scheduler).

Oct 11 '23 10:10 sjdv1982

seamless seamless copied to clipboard

Nested transformation improvements

seamless
seamless copied to clipboard