Orc hdfs
Related tasks: https://github.com/tensorflow/io/issues/1372#issuecomment-1072007873
This PR use Tensorflow Filesystem API to access HDFS. Instead of relying on libhdfspp, which is not included in the current compilation setup.
By the way, libhdfspp is not another wrapper of C libhdfs. But it is an implementation based on RPC protocol. Which is quite complex and some of the code seems not well maitained.
IMHO, we can rely on TensorFlow's modular Filesystem HDFS API. Which is based on libhdfs and quite stable. libtensorflow_io_plugins.so is loaded when import tensorflow_io is executed in Python. So the following C++ code
std::unique_ptr<tensorflow::RandomAccessFile> file_;
tensorflow::Env::Default()->NewRandomAccessFile("hdfs:///xxx/yyy/z", &file_);
returns a successful RandonAccessFile. In this way, we can support reading ORC from HDFS
By the way, Kerberos support is provided by libhdfs, libgssapi-krb5-2 etc., which must be installed on the environment.
I have tested libhdfspp and found that libhdfspp does not support kerberos.