systemds icon indicating copy to clipboard operation
systemds copied to clipboard

[SYSTEMDS-3405] Write matrices and frames at site for federated write

Open kev-inn opened this issue 3 years ago • 8 comments

Perform federated write at site of workers and locally write a MTD file containing the addresses. Frames work too, but testcases are still missing.

kev-inn avatar Jul 15 '22 12:07 kev-inn

yes please !

Baunsgaard avatar Jul 17 '22 16:07 Baunsgaard

There is a minor problem I don't know how to best fix. I need to select a path at the sites for the workers to write their partition, I am currently choosing to create a (most likely) unique filename and write into the LOCAL_TEMP_DIR (defined by the configuration). This usually works, but the federated python testcases have it set to /tmp/systemds which is apparently not available for our git runners. @Baunsgaard maybe we could change the configuration for the tests?

The problems mostly impact our testcases, but I also don't really have a favorite from a user side. Other choices for where to write and downsides:

  • current working directory (cwd): the cwd of our testcases is the root, therefore we would clutter it full of partitions
  • the scratch space: will be deleted when the worker is killed and in our case, because we just start the new workers from the same JVM, also when the testcase finishes. Therefore, we can't check the results.
  • A new configuration directory: pretty clean solution IMO, but would only be used by workers and I don't want to add it, except if we agree it is necessary

The other aspects are finished and this PR is ready for review.

kev-inn avatar Jul 31 '22 18:07 kev-inn

There is a minor problem I don't know how to best fix. I need to select a path at the sites for the workers to write their partition, I am currently choosing to create a (most likely) unique filename and write into the LOCAL_TEMP_DIR (defined by the configuration). This usually works, but the federated python testcases have it set to /tmp/systemds which is apparently not available for our git runners. @Baunsgaard maybe we could change the configuration for the tests?

The problems mostly impact our testcases, but I also don't really have a favorite from a user side. Other choices for where to write and downsides:

  • current working directory (cwd): the cwd of our testcases is the root, therefore we would clutter it full of partitions
  • the scratch space: will be deleted when the worker is killed and in our case, because we just start the new workers from the same JVM, also when the testcase finishes. Therefore, we can't check the results.
  • A new configuration directory: pretty clean solution IMO, but would only be used by workers and I don't want to add it, except if we agree it is necessary

The other aspects are finished and this PR is ready for review.

kev-inn avatar Jul 31 '22 18:07 kev-inn

There is a minor problem I don't know how to best fix. I need to select a path at the sites for the workers to write their partition, I am currently choosing to create a (most likely) unique filename and write into the LOCAL_TEMP_DIR (defined by the configuration). This usually works, but the federated python testcases have it set to /tmp/systemds which is apparently not available for our git runners. @Baunsgaard maybe we could change the configuration for the tests?

The problems mostly impact our testcases, but I also don't really have a favorite from a user side. Other choices for where to write and downsides:

* current working directory (cwd): the cwd of our testcases is the root, therefore we would clutter it full of partitions

* the scratch space: will be deleted when the worker is killed and in our case, because we just start the new workers from the same JVM, also when the testcase finishes. Therefore, we can't check the results.

* A new configuration directory: pretty clean solution IMO, but would only be used by workers and I don't want to add it, except if we agree it is necessary

The other aspects are finished and this PR is ready for review.

hmm, good question what the best option is.

I find the last option with a new configuration tempting. It could be made such that the federated workers use a configured path to make a directory, furthermore we can use their process ID to make a unique sub directory, and maintain a static counter inside to guarantee incrementing folder/file names, while appending the user specified file name in the end.

Baunsgaard avatar Aug 24 '22 14:08 Baunsgaard

This PR also change the write(M) (if M is federated) to not collect the matrix and write it locally. Is it because it is always expected to be written to federated site if the output is federated?

Baunsgaard avatar Aug 24 '22 15:08 Baunsgaard