[SYSTEMDS-3405] Write matrices and frames at site for federated write
Perform federated write at site of workers and locally write a MTD file containing the addresses. Frames work too, but testcases are still missing.
yes please !
There is a minor problem I don't know how to best fix. I need to select a path at the sites for the workers to write their partition, I am currently choosing to create a (most likely) unique filename and write into the LOCAL_TEMP_DIR (defined by the configuration). This usually works, but the federated python testcases have it set to /tmp/systemds which is apparently not available for our git runners. @Baunsgaard maybe we could change the configuration for the tests?
The problems mostly impact our testcases, but I also don't really have a favorite from a user side. Other choices for where to write and downsides:
- current working directory (cwd): the cwd of our testcases is the root, therefore we would clutter it full of partitions
- the scratch space: will be deleted when the worker is killed and in our case, because we just start the new workers from the same JVM, also when the testcase finishes. Therefore, we can't check the results.
- A new configuration directory: pretty clean solution IMO, but would only be used by workers and I don't want to add it, except if we agree it is necessary
The other aspects are finished and this PR is ready for review.
There is a minor problem I don't know how to best fix. I need to select a path at the sites for the workers to write their partition, I am currently choosing to create a (most likely) unique filename and write into the LOCAL_TEMP_DIR (defined by the configuration). This usually works, but the federated python testcases have it set to /tmp/systemds which is apparently not available for our git runners. @Baunsgaard maybe we could change the configuration for the tests?
The problems mostly impact our testcases, but I also don't really have a favorite from a user side. Other choices for where to write and downsides:
- current working directory (cwd): the cwd of our testcases is the root, therefore we would clutter it full of partitions
- the scratch space: will be deleted when the worker is killed and in our case, because we just start the new workers from the same JVM, also when the testcase finishes. Therefore, we can't check the results.
- A new configuration directory: pretty clean solution IMO, but would only be used by workers and I don't want to add it, except if we agree it is necessary
The other aspects are finished and this PR is ready for review.
There is a minor problem I don't know how to best fix. I need to select a path at the sites for the workers to write their partition, I am currently choosing to create a (most likely) unique filename and write into the LOCAL_TEMP_DIR (defined by the configuration). This usually works, but the federated python testcases have it set to
/tmp/systemdswhich is apparently not available for our git runners. @Baunsgaard maybe we could change the configuration for the tests?The problems mostly impact our testcases, but I also don't really have a favorite from a user side. Other choices for where to write and downsides:
* current working directory (cwd): the cwd of our testcases is the root, therefore we would clutter it full of partitions * the scratch space: will be deleted when the worker is killed and in our case, because we just start the new workers from the same JVM, also when the testcase finishes. Therefore, we can't check the results. * A new configuration directory: pretty clean solution IMO, but would only be used by workers and I don't want to add it, except if we agree it is necessaryThe other aspects are finished and this PR is ready for review.
hmm, good question what the best option is.
I find the last option with a new configuration tempting. It could be made such that the federated workers use a configured path to make a directory, furthermore we can use their process ID to make a unique sub directory, and maintain a static counter inside to guarantee incrementing folder/file names, while appending the user specified file name in the end.
This PR also change the write(M) (if M is federated) to not collect the matrix and write it locally. Is it because it is always expected to be written to federated site if the output is federated?