fscrawler icon indicating copy to clipboard operation
fscrawler copied to clipboard

Elasticsearch File System Crawler (FS Crawler)

Results 150 fscrawler issues
Sort by recently updated
recently updated
newest added

Just wanted to find out if it is possible to : i) detect strikethrough in pdf files ii) detect paragraph in pdf files

We should not extract all the raw metadata when `fs.raw_metadata` is enabled but only the non standard raw metadata. See https://github.com/dadoonet/fscrawler/blob/master/tika/src/main/java/fr/pilato/elasticsearch/crawler/fs/tika/TikaDocParser.java#L148-L185

feature_request
component:core

- Target feature: Provide information for the physical path of the file for which FSCralwer has failed to operate on (e.g. index in ES). - Current Situation: Currently you are...

feature_request
component:core

I would like to run fscrawler on a Raspberry Pi 4, but it has arm64 architecture. Although the core is written for JVM and should be architecture independent, the produced...

new

**Is your feature request related to a problem? Please describe.** We're building a crawler cluster for local area network. It intends to provide a convenient search service. People in there...

feature_request

**Is your feature request related to a problem? Please describe.** Many users of this scrawler run it as a scheduled tabk, docker container, or 24x7. Currently you have to resort...

feature_request
component:monitoring

Hi , I am currently trying to setup a pipeline for end to end document upload and delete . and i have successfully managed to upload a document using fscrawler...

While performing sizing testing to check how big a file can be ingested, it was noticed that anything above 10MB file size does not goes through. Even if ingestion into...

check_for_bug

Let's make the code more generic in preparation of #263 #264. Instead of writing `{job_name}/_status.json` file, let's write: * `{job_name}/_status-fs.json` for FS standard implementation * `{job_name}/_status-ssh.json` for SSH implementation *...

update

Although, tesseract is integrated with fscrawler for OCR. But, Tesseract fails when data is in tabular form. I found that ABBYY FineReader OCR does that efficiently. Is there any provision...

feature_request