mars
mars copied to clipboard
[BUG] to_csv raises FileNotFoundError: [Errno 2] No such file or directory
Describe the bug
The problem is that when execute to_csv in a distributed cluster, it may raises FileNotFoundError: [Errno 2] No such file or directory.

It is because the DataFrameToCSV operand will generate a DataFrameToCSVStat operand to prepare data file before the agg of DataFrameToCSV runs, each DataFrameToCSV writes the data to the offset of the file in OperandStage.agg stage.
But, the DataFrameToCSVStat and the DataFrameToCSV with stage=OperandStage.agg are scheduled to different nodes. So, when executing agg, it can't find the file created by DataFrameToCSVStat.
To Reproduce To help us reproducing this bug, please provide information below:
- Your Python version 3.7.7
- The version of Mars you use Latest master
- Versions of crucial packages, such as numpy, scipy and pandas
- Full stack of the error.
- Minimized code to reproduce the error.
Expected behavior A clear and concise description of what you expected to happen.
Additional context Add any other context about the problem here.
- If invoke
to_csvwith a local path, we can collect data to local byto_pandasand then call pandasto_csvinstead. - If invoke
to_csvwith a remote path such ashdfs/s3, I think it's about implementing a new operand.