David Pilato

Results 329 comments of David Pilato

Without Docker, it's running well with: ```sh FS_JAVA_OPTS="-DDOC_LEVEL=debug" bin/fscrawler ``` It gives: ``` 2025-03-19 16:38:42,334 [DEBUG] [532c5b61a7d65da58c4c8b9eb389a57][/issue-221-doc2.pdf] Indexing content 2025-03-19 16:38:42,348 [DEBUG] [e4da9a32f38871cb275513841ca1b87][/issue-418-中文名称.txt] Indexing content 2025-03-19 16:38:42,400 [DEBUG] [d6937c09c9bc97649309148ec8a90][/issue-1097.pdf] Indexing...

With Docker, I think I'm able to run it correctly with: ```sh docker pull dadoonet/fscrawler docker run -it --rm \ -v ~/.fscrawler:/root/.fscrawler \ -v /path/to/documents:/tmp/es:ro \ -e FS_JAVA_OPTS="-DLOG_LEVEL=debug -DDOC_LEVEL=debug" \...

I'm very sorry that I missed your issue... w00t! 2 years later... > Can this be changed somehow? E.g. provide some custom config to fscrawler (or Tika) describing the custom...

Where exactly did you put the job settings?

You need to change this line: command: fscrawler doc_idx --restart --rest To command: fscrawler documents_search --restart --rest Also note that you might have to change the name setting name: "doc_idx"...

Could you share the full logs and switch to trace mode?

May be try this: ``` command: fscrawler documents_search --trace --restart --rest ```

Could you share again: * The `./config/documents_search/_settings.yaml` file * The `./docker-compose.yml` file Thanks

Try this: ```yaml --- name: "documents_search" fs: url: "/tmp/es" indexed_chars: 100% lang_detect: true continue_on_error: true ocr: language: "eng+fra" enabled: true pdf_strategy: "ocr_and_text" elasticsearch: nodes: - url: "https://elasticsearch:9200" username: "elastic" password:...