magpie
magpie copied to clipboard
make a dumb "networkfs" plugin for hadoop
While working on #239 it reminded me of
https://issues.apache.org/jira/browse/MAPREDUCE-5528
and the fact that the terasort example doesn't work with "rawnetworkfs". A long time ago I wrote a "lustre" plugin for Hadoop that wasn't too far different than the "file:" URI filesystem plugin in Hadoop. It's around that time that looked into the code and realized that the "file:" URI is sometimes treated special in Hadoop and that was part of the reason "rawnetworkfs" doesn't work with terasort.
I wonder if https://issues.apache.org/jira/browse/SPARK-21570 could be caused by a similar issue. That internally in Spark, "file:" URIs are treated special and there is a corner case leading to the problem.
By creating a dumb "networkfs" (or similar) plugin, it might resolve multiple issues. The plugin would basically be a subclass of the "file:" URI class, completely identical but using the "networkfs:" URI instead. It would potentially work around these problems.
This would be simpler than the "magpienetworkfs" plugin that I wrote. That plugin tried to handle some path issues for the user. It would be a much simpler/dumber plugin. Whose only purpose was the allow the user to specify a different URI.