fluent-plugin-webhdfs
fluent-plugin-webhdfs copied to clipboard
httpFS - Do not create file if it does not exist
Hello,
We are running a MapR custer and webHDFS is not supported by MapR. So we are trying to populate hadoop using httpFS.
Our Webhdfs config :
@type webhdfs
host mapr-mapr-master-0
port 14000
path "/uhalogs/docker/docker-%M.log"
time_slice_format %M
flush_interval 5s
username mapr
httpfs true
However when using the fluentd plugin, logs are appended correclty to an existing file. But if the file does not exist (using a timestamp-based filename), we get a WebHDFS::ServerError instead of a WebHDFS::FileNotFoundError that would create the file I guess.
Error 500 received by Mapr :
{
"RemoteException": {
"message": "Append failed for file: /uhalogs/docker/testfile.log, error: No such file or directory (2)",
"exception": "IOException",
"javaClassName": "java.io.IOException"
}
}
logs by fluentd-webhdfs plugin :
2017-01-12 13:59:09 +0000 [warn]: failed to communicate hdfs cluster, path: /uhalogs/docker/docker-58.log
2017-01-12 13:59:09 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-01-12 14:00:13 +0000 error_class="WebHDFS::ServerError" error="{\"RemoteException\":{\"message\":\"Append failed for file: \\/uhalogs\\/docker\\/docker-58.log, error: No such file or directory (2)\",\"exception\":\"IOException\",\"javaClassName\":\"java.io.IOException\"}}" plugin_id="object:3fe5f920c960"
2017-01-12 13:59:09 +0000 [warn]: suppressed same stacktrace
related code : https://github.com/fluent/fluent-plugin-webhdfs/blob/master/lib/fluent/plugin/out_webhdfs.rb#L262
What I am not sure and I can't find proper specifications for HttpFS on the web is :
- Is it a bad implementation of httpFS on MapR side or should we handle this exception as well on the fluentd plugin ?
Thank You Alban
I'm also experiencing this problem but on the cloudera platform. We cannot use webhdfs because it does not have HA capabilities compared to httpfs.
Sorry for missing this issue. I'm not familiar with HttpFS but if the WebHDFS and HttpFS are incompatibile in several operations, we should care it.
Is it a bad implementation of httpFS on MapR side
From enarciso comment, it seems HttpFS behaviour is same on several distribution. I'm not sure this is a bug of HttpFS or not.
I think append
operation should create new file when file doesn't exist.
My unfortunate workaround at the moment is to constantly monitor the httpfs logs, watch for string like above and run a touchz
to create the file. Thank you for look into this @repeatedly
WebHDFS::ServerError
means that the client (fluentd) receives HTTP response code 500 from HttpFs server. WebHDFS server returns 404 for such cases.
IMO it's a bug of HttpFs implementation, because of behavior incompatibility between WebHDFS and HttpFs.
And it (HttpFs) is interoperable with the webhdfs REST HTTP API. https://hadoop.apache.org/docs/r2.8.0/hadoop-hdfs-httpfs/index.html
Thank you @tagomoris, ive open a case with Cloudera.