fiware-cygnus
fiware-cygnus copied to clipboard
Cygnus error with hdfsSink
Hello everyone:
I am trying to save data sensors in HDFS, I am using CYGNUS.
the cygnus configuration is below:
cygnus-ngsi.sources = http-source cygnus-ngsi.sinks = hdfs-sink cygnus-ngsi.channels = hdfs-channel
cygnus-ngsi.sources.http-source.type = org.apache.flume.source.http.HTTPSource cygnus-ngsi.sources.http-source.channels = hdfs-channel cygnus-ngsi.sources.http-source.port = 5050 cygnus-ngsi.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.NGSIRestHandler cygnus-ngsi.sources.http-source.handler.notification_target = /notify cygnus-ngsi.sources.http-source.handler.default_service = default cygnus-ngsi.sources.http-source.handler.default_service_path = / cygnus-ngsi.sources.http-source.interceptors = ts gi cygnus-ngsi.sources.http-source.interceptors.ts.type = timestamp cygnus-ngsi.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.NGSIGroupingInterceptor$Builder cygnus-ngsi.sources.http-source.interceptors.gi.grouping_rules_conf_file = /opt/apache-flume/conf/grouping_rules.conf cygnus-ngsi.sources.http-source.interceptors.nmi.type = com.telefonica.iot.cygnus.interceptors.NGSINameMappingsInterceptor$Builder cygnus-ngsi.sources.http-source.interceptors.nmi.name_mappings_conf_file = /opt/apache-flume/conf/name_mappings.conf
cygnus-ngsi.sinks.hdfs-sink.type = com.telefonica.iot.cygnus.sinks.NGSIHDFSSink cygnus-ngsi.sinks.hdfs-sink.channel = hdfs-channel #cygnus-ngsi.sinks.hdfs-sink.enable_encoding = false #cygnus-ngsi.sinks.hdfs-sink.enable_grouping = false #cygnus-ngsi.sinks.hdfs-sink.enable_lowercase = false #cygnus-ngsi.sinks.hdfs-sink.enable_name_mappings = false #cygnus-ngsi.sinks.hdfs-sink.data_model = dm-by-entity #cygnus-ngsi.sinks.hdfs-sink.file_format = json-column #cygnus-ngsi.sinks.hdfs-sink.backend.impl = rest #cygnus-ngsi.sinks.hdfs-sink.backend.max_conns = 500 #cygnus-ngsi.sinks.hdfs-sink.backend.max_conns_per_route = 100 cygnus-ngsi.sinks.hdfs-sink.hdfs_host = 10.9.8.29 cygnus-ngsi.sinks.hdfs-sink.hdfs_port = 50070 cygnus-ngsi.sinks.hdfs-sink.hdfs_username = stack cygnus-ngsi.sinks.hdfs-sink.hdfs_password = stack #cygnus-ngsi.sinks.hdfs-sink.oauth2_token = #cygnus-ngsi.sinks.hdfs-sink.service_as_namespace = false #cygnus-ngsi.sinks.hdfs-sink.oauth2_token = #cygnus-ngsi.sinks.hdfs-sink.service_as_namespace = false #cygnus-ngsi.sinks.hdfs-sink.batch_size = 100 #cygnus-ngsi.sinks.hdfs-sink.batch_timeout = 30 #cygnus-ngsi.sinks.hdfs-sink.batch_ttl = 10 #cygnus-ngsi.sinks.hdfs-sink.batch_retry_intervals = 5000 #cygnus-ngsi.sinks.hdfs-sink.hive = false #cygnus-ngsi.sinks.hdfs-sink.krb5_auth = false
cygnus-ngsi.channels.hdfs-channel.type = com.telefonica.iot.cygnus.channels.CygnusMemoryChannel cygnus-ngsi.channels.hdfs-channel.capacity = 100000 cygnus-ngsi.channels.hdfs-channel.transactionCapacity = 10000
I have created a Hadoop cluster in the following versions: 2.6.0, 2.7.7, 3.2.0 and in each case the same error occurs:
Cygnus logs:
time=2019-05-12T07:39:15.133Z | lvl=INFO | corr=N/A | trans=N/A | srv=N/A | subsrv=N/A | comp=cygnus-ngsi | op=persistAggregation | msg=com.telefonica.iot.cygnus.sinks.NGSIHDFSSink[1067] : [hdfs-sink] There was some problem with the current endpoint, trying other one. Details: CygnusPersistenceError (IOException). Request error (hdfsServer: Name or service not known). time=2019-05-12T07:39:15.133Z | lvl=ERROR | corr=N/A | trans=N/A | srv=N/A | subsrv=N/A | comp=cygnus-ngsi | op=processRollbackedBatches | msg=com.telefonica.iot.cygnus.sinks.NGSISink[399] : CygnusPersistenceError. No endpoint was available. Stack trace: [com.telefonica.iot.cygnus.sinks.NGSIHDFSSink.persistAggregation(NGSIHDFSSink.java:1077), com.telefonica.iot.cygnus.sinks.NGSIHDFSSink.persistBatch(NGSIHDFSSink.java:495), com.telefonica.iot.cygnus.sinks.NGSISink.processRollbackedBatches(NGSISink.java:391), com.telefonica.iot.cygnus.sinks.NGSISink.process(NGSISink.java:373), org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67), org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145), java.lang.Thread.run(Thread.java:748)]
cygnus can create the path inside the hadoop, but cygnus can not create the txt file with the measurements
best regard Antonio
By the error message you get (hdfsServer: Name or service not known)
it seems to be some kind of problem with your cluster setup and/or the Cygnus-to-cluster connection.
Maybe some of the following parameters is involved:
cygnus-ngsi.sinks.hdfs-sink.hdfs_host = 10.9.8.29
cygnus-ngsi.sinks.hdfs-sink.hdfs_port = 50070
cygnus-ngsi.sinks.hdfs-sink.hdfs_username = stack
cygnus-ngsi.sinks.hdfs-sink.hdfs_password = stack
Looking to the https://fiware-cygnus.readthedocs.io/en/master/cygnus-ngsi/flume_extensions_catalogue/ngsi_hdfs_sink/index.html, you are not using the cygnus-ngsi.sinks.hdfs-sink.backend.impl
parameter, which defaults to rest
. But reading documentation about hdfs_port:
14000 if using HttpFS (rest), 50070 if using WebHDFS (rest), 8020 if using the Hadoop API (binary).
So maybe you are using a wrong port.
In addition, it would be a good sanity check to check the WebHDFS/HttpFS API (some basic GET method) from the system running Cygnus in order to check your cluster is ok and is reachable.