seamless icon indicating copy to clipboard operation
seamless copied to clipboard

Nested transformation improvements

Open sjdv1982 opened this issue 1 year ago • 4 comments

seamless.direct.transformer wraps a function inside a DirectTransformer object, which launches direct transformations (seamless.direct.Transformation) when called. In addition, direct transformations can also be created from unbound high-level Transformer objects (Transformer.get_transformation). Nested transformation is when direct transformations are created inside an existing transformation.

There are two kinds of nested transformation: local and non-local (delegated). By default, DirectTransformer objects have local=None, meaning that delegated nested transformation is tried first. Local nested transformation is then used as a fallback.

Local nested transformation already works. After the transformation a been launched in a forked seamless.core.execute.execute call, the forking modulates any subsequent call involving the seamless.direct.run machinery. Namely, seamless.direct.run will now forward local nested transformation calls to the parent process via a parent process queue. Some improvement may be needed, because currently, all calls are queued up until any call is waited for, causing all calls to be launched-and-waited-for only then.

Non-local nested transformation means that an assistant must be available inside the transformer. This doesn't work for any of the current assistants (micro, mini or mini-dask). This will be a bit complicated in cases where the assistant lives on a user machine whereas the job is executed on a cluster. Barring some kind of reverse tunneling or websockets, one solution for dask-based execution is to make a "in-process assistant" as a thin wrapper around the Dask scheduler (which is by necessity available for each worker). Add to the assistant protocol a "release lock"/"acquire lock" APl. For the Dask in-process assistant, theses will be simple wrappers around Client.secede() and Client.rejoin().

sjdv1982 avatar Sep 25 '23 11:09 sjdv1982

There is now an InProcessAssistant class.

sjdv1982 avatar Sep 25 '23 19:09 sjdv1982

Instead of communicating to the Dask scheduler, a worker could also try reach the Dask client inside the original assistant. In that case, use the same Dask mechanism as https://github.com/sjdv1982/seamless/issues/219, and probably store an ID that identifies the original assistant (since multiple assistants can connect to the scheduler).

sjdv1982 avatar Oct 11 '23 10:10 sjdv1982

See also #241

sjdv1982 avatar Dec 17 '23 11:12 sjdv1982

Non-local nested transformation now works for a "local" mini assistant (devel), by setting the Seamless assistant IP address to "localhost" inside the mini-assistant-devel Docker image.

(the meaning of "local" is getting a bit confused here!)

sjdv1982 avatar Mar 11 '24 10:03 sjdv1982