David Pilato
David Pilato
Could you try with: image: dadoonet/fscrawler:2.10-SNAPSHOT And docker pull dadoonet/fscrawler:2.10-SNAPSHOT **EDIT**: actually no. You are using the latest build which should be good.
This looks weird to me `/ROOT/.FSCRAWLER`. All is in uppercase where FSCrawler expects `/root/.fscrawler`...
Thanks! This is smelling like a bug indeed in the way we are computing the `_id`. I need to check this later. Thanks for opening this and sharing the details!
w00t! Nice finding! Thanks for debugging this. I'll try to find a way to fix that unless you have yourself an idea to fix it ;)
``` 12:51:52,786 TRACE [f.p.e.c.f.c.ElasticsearchClient] POST https://a1wapapp184.europe.prestagroup.com:9200//mdoc_m9989/_search gives {"took":0,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":2,"relation":"eq"},"max_score":0.18232156,"hits":[{"_index":"mdoc_m9989","_type":"_doc","_id":"Example.pdf","_version":1,"_score":0.18232156,"fields":{"file.filename":["Example.pdf"]}},{"_index":"mdoc_m9989","_type":"_doc","_id":"file-example_PDF_1MB.pdf","_version":1,"_score":0.18232156,"fields":{"file.filename":["file-example_PDF_1MB.pdf"]}}]}} ``` There's something I don't understand. In the index `mdoc_m9989` you apparently have 2 docs. The `_id` field for the 2 docs...
So I can confirm the bug as I explained in https://github.com/dadoonet/fscrawler/issues/2019#issuecomment-3308411611
I pushed a change with #2198 which hopefully helps to solve the issue. @sagentac if you are still using FSCrawler on Windows, could you confirm?
So what should be a good order in your opinion? For some use cases, I have the feeling that the most recent documents are the most relevant vs the oldest....
Thanks for reporting. I need to check 👍🏼
I can reproduce the issue. Let me work on a fix...