fscrawler
fscrawler copied to clipboard
Failed to start Grizzly HTTP server: Address already in use
Describe the bug
Fails to send index (also errors about threads).
Job Settings
---
name: "idx"
fs:
url: "/tmp/es"
indexed_chars: 100%
lang_detect: true
continue_on_error: true
update_rate: "1m"
ocr:
pdf_strategy: "no_ocr"
elasticsearch:
nodes:
- url: "https://elasticsearch:9200"
username: "elastic"
password: "changeme"
ssl_verification: false
index: "idx"
index_folder: "idx_folder"
rest :
url: "http://fscrawler:8080"
# command to start manually indexing in container terminal:
# bash bin/fscrawler idx --rest --restart --debug
Logs
# bash bin/fscrawler idx --rest --restart --debug
21:04:43,011 INFO [f.console] ,----------------------------------------------------------------------------------------------------.
| ,---,. .--.--. ,----.. ,--, 2.10-SNAPSHOT |
| ,' .' | / / '. / / \ ,--.'| |
| ,---.' || : /`. / | : : __ ,-. .---.| | : __ ,-. |
| | | .'; | |--` . | ;. /,' ,'/ /| /. ./|: : ' ,' ,'/ /| |
| : : : | : ;_ . ; /--` ' | |' | ,--.--. .-'-. ' || ' | ,---. ' | |' | |
| : | |-, \ \ `. ; | ; | | ,'/ \ /___/ \: |' | | / \ | | ,' |
| | : ;/| `----. \| : | ' : / .--. .-. | .-'.. ' ' .| | : / / |' : / |
| | | .' __ \ \ |. | '___ | | ' \__\/: . ./___/ \: '' : |__ . ' / || | ' |
| ' : ' / /`--' /' ; : .'|; : | ," .--.; |. \ ' .\ | | '.'|' ; /|; : | |
| | | | '--'. / ' | '/ :| , ; / / ,. | \ \ ' \ |; : ;' | / || , ; |
| | : \ `--'---' | : / ---' ; : .' \ \ \ |--" | , / | : | ---' |
| | | ,' \ \ .' | , .-./ \ \ | ---`-' \ \ / |
| `----' `---` `--`---' '---" `----' |
+----------------------------------------------------------------------------------------------------+
| You know, for Files! |
| Made from France with Love |
| Source: https://github.com/dadoonet/fscrawler/ |
| Documentation: https://fscrawler.readthedocs.io/ |
`----------------------------------------------------------------------------------------------------'
21:04:43,028 INFO [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [195.9mb/3.1gb=6.03%], RAM [7.7gb/12.6gb=60.87%], Swap [1.4gb/1.4gb=100.0%].
21:04:43,032 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings.json] already exists
21:04:43,033 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings_folder.json] already exists
21:04:43,034 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings.json] already exists
21:04:43,035 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings_folder.json] already exists
21:04:43,036 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_wpsearch_settings.json] already exists
21:04:43,037 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [8/_settings.json] already exists
21:04:43,038 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [8/_settings_folder.json] already exists
21:04:43,041 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [8/_wpsearch_settings.json] already exists
21:04:43,044 DEBUG [f.p.e.c.f.c.FsCrawlerCli] Cleaning existing status for job [idx]...
21:04:43,050 DEBUG [f.p.e.c.f.c.FsCrawlerCli] Starting job [idx]...
21:04:43,326 INFO [f.p.e.c.f.FsCrawlerImpl] Starting FS crawler
21:04:43,327 INFO [f.p.e.c.f.FsCrawlerImpl] FS crawler started in watch mode. It will run unless you stop it with CTRL+C.
21:04:43,498 WARN [f.p.e.c.f.c.ElasticsearchClient] We are not doing SSL verification. It's not recommended for production.
21:04:43,526 DEBUG [f.p.e.c.f.c.ElasticsearchClient] get version
SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
SLF4J: Class path contains SLF4J bindings targeting slf4j-api versions 1.7.x or earlier.
SLF4J: Ignoring binding found at [jar:file:/usr/share/fscrawler/lib/log4j-slf4j-impl-2.19.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See https://www.slf4j.org/codes.html#ignoredBindings for an explanation.
21:04:44,273 DEBUG [f.p.e.c.f.c.ElasticsearchClient] get version returns 8.3.3 and 8 as the major version number
21:04:44,275 INFO [f.p.e.c.f.c.ElasticsearchClient] Elasticsearch Client connected to a node running version 8.3.3
21:04:44,286 DEBUG [f.p.e.c.f.s.FsCrawlerManagementServiceElasticsearchImpl] Elasticsearch Management Service started
21:04:44,290 WARN [f.p.e.c.f.c.ElasticsearchClient] We are not doing SSL verification. It's not recommended for production.
21:04:44,296 DEBUG [f.p.e.c.f.c.ElasticsearchClient] get version
21:04:44,406 DEBUG [f.p.e.c.f.c.ElasticsearchClient] get version returns 8.3.3 and 8 as the major version number
21:04:44,407 INFO [f.p.e.c.f.c.ElasticsearchClient] Elasticsearch Client connected to a node running version 8.3.3
21:04:44,409 DEBUG [f.p.e.c.f.s.FsCrawlerDocumentServiceElasticsearchImpl] Elasticsearch Document Service started
21:04:44,417 DEBUG [f.p.e.c.f.c.ElasticsearchClient] create index [idx]
21:04:44,445 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Error while running PUT https://elasticsearch:9200/idx: {"error":{"root_cause":[{"type":"resource_already_exists_exception","reason":"index [idx/0-MWloshQ5iHxdjUAwSC3w] already exists","index_uuid":"0-MWloshQ5iHxdjUAwSC3w","index":"idx"}],"type":"resource_already_exists_exception","reason":"index [idx/0-MWloshQ5iHxdjUAwSC3w] already exists","index_uuid":"0-MWloshQ5iHxdjUAwSC3w","index":"idx"},"status":400}
21:04:44,447 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Response for create index [idx]: HTTP 400 Bad Request
21:04:44,457 DEBUG [f.p.e.c.f.c.ElasticsearchClient] create index [idx_folder]
21:04:44,469 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Error while running PUT https://elasticsearch:9200/idx_folder: {"error":{"root_cause":[{"type":"resource_already_exists_exception","reason":"index [idx_folder/QX1FNqOtQHyBM1273MpHVg] already exists","index_uuid":"QX1FNqOtQHyBM1273MpHVg","index":"idx_folder"}],"type":"resource_already_exists_exception","reason":"index [idx_folder/QX1FNqOtQHyBM1273MpHVg] already exists","index_uuid":"QX1FNqOtQHyBM1273MpHVg","index":"idx_folder"},"status":400}
21:04:44,471 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Response for create index [idx_folder]: HTTP 400 Bad Request
21:04:44,479 DEBUG [f.p.e.c.f.FsParserAbstract] creating fs crawler thread [idx] for [/tmp/es] every [1m]
21:04:44,483 INFO [f.p.e.c.f.FsParserAbstract] FS crawler started for [idx] for [/tmp/es] every [1m]
21:04:44,488 DEBUG [f.p.e.c.f.FsParserAbstract] Fs crawler thread [idx] is now running. Run #1...
21:04:44,515 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(/tmp/es, /tmp/es) = /
21:04:44,548 DEBUG [f.p.e.c.f.FsParserAbstract] indexing [/tmp/es] content
21:04:44,549 DEBUG [f.p.e.c.f.c.f.FileAbstractorFile] Listing local files from /tmp/es
21:04:44,561 DEBUG [f.p.e.c.f.c.f.FileAbstractorFile] 1 local files found
21:04:44,564 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(/tmp/es, /tmp/es/8-Book Manuscript-93-1-10-20170505.pdf) = /8-Book Manuscript-93-1-10-20170505.pdf
21:04:44,567 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] directory = [false], filename = [/8-Book Manuscript-93-1-10-20170505.pdf], includes = [null], excludes = [null]
21:04:44,570 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/8-Book Manuscript-93-1-10-20170505.pdf], excludes = [null]
21:04:44,572 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] filename = [/8-Book Manuscript-93-1-10-20170505.pdf], includes = [null]
21:04:44,577 DEBUG [f.p.e.c.f.FsParserAbstract] [/8-Book Manuscript-93-1-10-20170505.pdf] can be indexed: [true]
21:04:44,580 DEBUG [f.p.e.c.f.FsParserAbstract] - file: /8-Book Manuscript-93-1-10-20170505.pdf
21:04:44,587 DEBUG [f.p.e.c.f.FsParserAbstract] fetching content from [/tmp/es],[8-Book Manuscript-93-1-10-20170505.pdf]
21:04:44,594 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(/tmp/es, /tmp/es/8-Book Manuscript-93-1-10-20170505.pdf) = /8-Book Manuscript-93-1-10-20170505.pdf
21:04:44,614 DEBUG [f.p.e.c.f.t.TikaInstance] OCR is activated so we need to configure Tesseract in case we have specific settings.
21:04:44,627 DEBUG [f.p.e.c.f.t.TikaInstance] Tesseract Language set to [eng].
21:04:44,658 DEBUG [f.p.e.c.f.t.TikaInstance] OCR is activated.
21:04:44,721 DEBUG [f.p.e.c.f.t.TikaInstance] OCR strategy for PDF documents is [no_ocr] and tesseract was found.
21:04:44,724 INFO [f.p.e.c.f.t.TikaInstance] OCR is enabled. This might slowdown the process.
21:04:44,896 WARN [o.g.j.s.w.WadlFeature] JAXBContext implementation could not be found. WADL feature is disabled.
21:04:45,192 WARN [o.g.j.i.i.Providers] A provider fr.pilato.elasticsearch.crawler.fs.rest.DocumentApi registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider fr.pilato.elasticsearch.crawler.fs.rest.DocumentApi will be ignored.
21:04:45,199 WARN [o.g.j.i.i.Providers] A provider fr.pilato.elasticsearch.crawler.fs.rest.ServerStatusApi registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider fr.pilato.elasticsearch.crawler.fs.rest.ServerStatusApi will be ignored.
21:04:45,203 WARN [o.g.j.i.i.Providers] A provider fr.pilato.elasticsearch.crawler.fs.rest.UploadApi registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider fr.pilato.elasticsearch.crawler.fs.rest.UploadApi will be ignored.
21:04:45,588 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [idx]
21:04:45,590 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is still running
java.lang.Exception: Stack trace
at java.base/java.lang.Thread.dumpStack(Thread.java:1380)
at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.close(FsCrawlerImpl.java:162)
at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:292)
21:04:46,093 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is still running
java.lang.Exception: Stack trace
at java.base/java.lang.Thread.dumpStack(Thread.java:1380)
at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.close(FsCrawlerImpl.java:162)
at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:292)
21:04:46,599 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is still running
java.lang.Exception: Stack trace
at java.base/java.lang.Thread.dumpStack(Thread.java:1380)
at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.close(FsCrawlerImpl.java:162)
at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:292)
21:04:47,102 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is still running
java.lang.Exception: Stack trace
at java.base/java.lang.Thread.dumpStack(Thread.java:1380)
at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.close(FsCrawlerImpl.java:162)
at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:292)
21:04:47,604 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is still running
java.lang.Exception: Stack trace
at java.base/java.lang.Thread.dumpStack(Thread.java:1380)
at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.close(FsCrawlerImpl.java:162)
at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:292)
21:04:48,105 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is still running
java.lang.Exception: Stack trace
at java.base/java.lang.Thread.dumpStack(Thread.java:1380)
at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.close(FsCrawlerImpl.java:162)
at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:292)
21:04:48,608 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is still running
java.lang.Exception: Stack trace
at java.base/java.lang.Thread.dumpStack(Thread.java:1380)
at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.close(FsCrawlerImpl.java:162)
at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:292)
21:04:49,111 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is still running
java.lang.Exception: Stack trace
at java.base/java.lang.Thread.dumpStack(Thread.java:1380)
at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.close(FsCrawlerImpl.java:162)
at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:292)
21:04:49,287 DEBUG [f.p.e.c.f.f.b.FsCrawlerSimpleBulkProcessorListener] Going to execute new bulk composed of 1 actions
21:04:49,341 DEBUG [f.p.e.c.f.c.ElasticsearchEngine] Sending a bulk request of [1] documents to the Elasticsearch service
21:04:49,344 DEBUG [f.p.e.c.f.c.ElasticsearchClient] bulk a ndjson of 214 characters
21:04:49,467 DEBUG [f.p.e.c.f.f.b.FsCrawlerSimpleBulkProcessorListener] Executed bulk composed of 1 actions
21:04:49,612 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is still running
java.lang.Exception: Stack trace
at java.base/java.lang.Thread.dumpStack(Thread.java:1380)
at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.close(FsCrawlerImpl.java:162)
at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:292)
21:04:50,118 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is still running
java.lang.Exception: Stack trace
at java.base/java.lang.Thread.dumpStack(Thread.java:1380)
at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.close(FsCrawlerImpl.java:162)
at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:292)
21:04:50,623 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is still running
java.lang.Exception: Stack trace
at java.base/java.lang.Thread.dumpStack(Thread.java:1380)
at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.close(FsCrawlerImpl.java:162)
at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:292)
21:04:50,696 WARN [f.p.e.c.f.FsParserAbstract] trying to add new file while closing crawler. Document [idx]/[8e4bd646fb713439cd768f5c5047e135] has been ignored
21:04:50,699 DEBUG [f.p.e.c.f.FsParserAbstract] Looking for removed files in [/tmp/es]...
21:04:50,700 DEBUG [f.p.e.c.f.FsParserAbstract] Looking for removed directories in [/tmp/es]...
21:04:50,727 DEBUG [f.p.e.c.f.FsParserAbstract] Fs crawler is going to sleep for 1m
21:04:50,729 DEBUG [f.p.e.c.f.FsParserAbstract] FS crawler thread [idx] is now marked as closed...
21:04:51,129 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
21:04:51,131 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Closing Elasticsearch client manager
21:04:51,132 DEBUG [f.p.e.c.f.f.b.FsCrawlerBulkProcessor] Closing BulkProcessor
21:04:51,134 DEBUG [f.p.e.c.f.f.b.FsCrawlerBulkProcessor] BulkProcessor is now closed
21:04:51,142 DEBUG [f.p.e.c.f.s.FsCrawlerManagementServiceElasticsearchImpl] Elasticsearch Management Service stopped
21:04:51,144 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Closing Elasticsearch client manager
21:04:51,146 DEBUG [f.p.e.c.f.f.b.FsCrawlerBulkProcessor] Closing BulkProcessor
21:04:51,149 DEBUG [f.p.e.c.f.f.b.FsCrawlerBulkProcessor] BulkProcessor is now closed
21:04:51,155 DEBUG [f.p.e.c.f.s.FsCrawlerDocumentServiceElasticsearchImpl] Elasticsearch Document Service stopped
21:04:51,158 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
21:04:51,161 INFO [f.p.e.c.f.FsCrawlerImpl] FS crawler [idx] stopped
21:04:51,163 FATAL [f.p.e.c.f.c.FsCrawlerCli] Fatal error received while running the crawler: [Failed to start Grizzly HTTP server: Address already in use]
21:04:51,165 DEBUG [f.p.e.c.f.c.FsCrawlerCli] error caught
jakarta.ws.rs.ProcessingException: Failed to start Grizzly HTTP server: Address already in use
at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpServerFactory.createHttpServer(GrizzlyHttpServerFactory.java:318) ~[jersey-container-grizzly2-http-3.1.0.jar:?]
at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpServerFactory.createHttpServer(GrizzlyHttpServerFactory.java:93) ~[jersey-container-grizzly2-http-3.1.0.jar:?]
at fr.pilato.elasticsearch.crawler.fs.rest.RestServer.start(RestServer.java:64) ~[fscrawler-rest-2.10-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:306) ~[fscrawler-cli-2.10-SNAPSHOT.jar:?]
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method) ~[?:?]
at sun.nio.ch.Net.bind(Net.java:555) ~[?:?]
at sun.nio.ch.ServerSocketChannelImpl.netBind(ServerSocketChannelImpl.java:337) ~[?:?]
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:294) ~[?:?]
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:89) ~[?:?]
at org.glassfish.grizzly.nio.transport.TCPNIOBindingHandler.bindToChannelAndAddress(TCPNIOBindingHandler.java:95) ~[grizzly-framework-4.0.0.jar:4.0.0]
at org.glassfish.grizzly.nio.transport.TCPNIOBindingHandler.bind(TCPNIOBindingHandler.java:63) ~[grizzly-framework-4.0.0.jar:4.0.0]
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:224) ~[grizzly-framework-4.0.0.jar:4.0.0]
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:207) ~[grizzly-framework-4.0.0.jar:4.0.0]
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:199) ~[grizzly-framework-4.0.0.jar:4.0.0]
at org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:740) ~[grizzly-http-server-4.0.0.jar:4.0.0]
at org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:234) ~[grizzly-http-server-4.0.0.jar:4.0.0]
at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpServerFactory.createHttpServer(GrizzlyHttpServerFactory.java:315) ~[jersey-container-grizzly2-http-3.1.0.jar:?]
... 3 more
21:04:51,185 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [idx]
21:04:51,187 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
21:04:51,190 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Closing Elasticsearch client manager
21:04:51,192 DEBUG [f.p.e.c.f.s.FsCrawlerManagementServiceElasticsearchImpl] Elasticsearch Management Service stopped
#
Versions:
- OS: Windows 10, Docker v4.14.1, Fscrawler@latest, ELK 8.3.3
Do you have a service already using this 8080 address running?