flume-ng-extends-source
flume-ng-extends-source copied to clipboard
插件在flume1.6下遇到了一些问题
更新:重新启动后成功运行了,谢谢作者开发的这个插件 您好, 我在按照文中的步骤编译、安装该插件后,出现了一些问题,我的flume版本为flume1.6.0 1.启动flume agent出错:
[ERROR - org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:361)] Source tail has been removed due to an error during configuration
java.lang.IllegalArgumentException: Must supply a valid regex string
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
at org.apache.flume.interceptor.RegexExtractorInterceptor$Builder.configure(RegexExtractorInterceptor.java:175)
at org.apache.flume.channel.ChannelProcessor.configureInterceptors(ChannelProcessor.java:110)
at org.apache.flume.channel.ChannelProcessor.configure(ChannelProcessor.java:80)
at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:348)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:97)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2.出错后仍然可以将信息输出到配置的elasticsearch sink中,但是elasticsearch中得到的message并不是读取的文件内容呀 flume配置如下:
agent.sources.tail.type = com.github.ningg.flume.source.SpoolDirectoryTailFileSource
# on WIN plantform spoolDir should be format like: E:/program files/spoolDir
# Note: the value of spoolDir MUST NOT be surrounded by quotation marks.
agent.sources.tail.spoolDir = /root/log
agent.sources.tail.fileSuffix = .COMPLETED
agent.sources.tail.deletePolicy = never
#agent.sources.tail.ignorePattern = .tmp
agent.sources.tail.targetPattern = nginx_access.log.(\\d){4}-(\\d){2}-(\\d){2}
agent.sources.tail.targetFilename = yyyy-MM-dd
agent.sources.tail.trackerDir = .flumespooltail
agent.sources.tail.consumeOrder = oldest
agent.sources.tail.batchSize = 100
agent.sources.tail.inputCharset = UTF-8
agent.sources.tail.decodeErrorPolicy = REPLACE
agent.sources.tail.deserializer = LINE
elasticsearch显示读取到的消息为/root/log/nginx_access.log.,显然这不是我的文件内容呀。 问题太多,希望您能耐心读完,不胜感激
最近有点忙,才看到。上面你的配置,是说要读取:/root/log 目录下,nginx_access.log.yyyy-MM-dd 命名的文件吧。
可以描述一下你的使用场景:
- 收集:哪个目录的哪些文件?
- 文件的命名有规律吗?
- 等等,所有跟你使用场景相关的描述。
有了,你使用场景的详细描述,我才好判断配置是否正确,以及定位问题。
重新启动后已经可以成功运行了,第一次不知道为啥会出现奇怪的错误,谢谢 场景就是:要读取的/root/log中的nginx_access.log.xxxx-xx-xx文件后缀每天随当天日期变化,现在已经可以成功运行了,正在测试中,谢谢