divolte-collector
divolte-collector copied to clipboard
Divolte not writing to HDFS
My kafka sink is working but my HDFS sink is not working. Im using hdfs 2.0 so that might be why? Ive got divolte running in a docker container and a hadoop cluster running in the same docker-compose network which I got from https://github.com/big-data-europe/docker-hadoop
here is are the relevant parts of my divolte-collector.conf (some parts stripped for brevity):
hdfs {
enabled = true
enabled = ${?DIVOLTE_HDFS_ENABLED}
threads = 2
buffer_size = 1048576
client {
fs.DEFAULT_FS = "hdfs://localhost:9870"
}
}
mappings {
my_mapping = {
schema_file = "/opt/divolte/divolte-collector/conf/DivolteRecord.avsc"
mapping_script_file = "/opt/divolte/divolte-collector/conf/mapping.groovy"
sources = [browser]
sinks = [divolte_kafka_sink, divolte_hdfs_sink]
}
}
sinks {
divolte_hdfs_sink = {
type = hdfs
file_strategy {
sync_file_after_records = 1000
sync_file_after_records = ${?DIVOLTE_HDFS_SINK_SYNC_NR_OF_RECORDS}
sync_file_after_duration = 30 minutes
sync_file_after_duration = ${?DIVOLTE_HDFS_SINK_SYNC_DURATION}
working_dir = /tmp/working
working_dir = ${?DIVOLTE_HDFS_SINK_WORKING_DIR}
publish_dir = /tmp/processed
publish_dir = ${?DIVOLTE_HDFS_SINK_PUBLISH_DIR}
}
}
For fs.DEFAULT_FS, Ive tried hdfs://localhost:9870 and hdfs://namenode:9870 (namenode is the name of the hdfs namenode container running in the same docker network)
Can you be a bit more specific about not working? Do you see any errors?
Here is the error:
[main] WARN [NativeCodeLoader]: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable divolte | Exception in thread "main" 2019-07-29 15:20:24.908Z [main] ERROR [HdfsFileManager]: Could not initialize HDFS filesystem or failed to check for existence of publish and / or working directories.. divolte | org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "hdfs" divolte | at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3332)
Then I added fs.file.impl = "org.apache.hadoop.fs.LocalFileSystem" and fs.hdfs.impl = "org.apache.hadoop.hdfs.DistributedFileSystem" to my hdfs configuration in divolte and now im getting a different error:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(Ljava/lang/String;)Ljava/net/InetSocketAddress; divolte | at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:99)
according to this thread (https://stackoverflow.com/questions/45460909/accessing-hdfs-in-java-throws-error) there is an issue with dependency versions in divolte but im not sure who to change that in divolte...
This pull request may help : https://github.com/divolte/divolte-collector/pull/244