datapusher-plus icon indicating copy to clipboard operation
datapusher-plus copied to clipboard

Add PDF to supported formats; summarize content and extract tags using LLM

Open jqnatividad opened this issue 1 year ago • 2 comments

The legacy Datapusher used to support PDFs, as messytables supported extracting tables from PDFs using pdftables.

That functionality has been removed, as well as Excel support.

We reenabled Excel support in DP+ using qsv.

We should re-enable PDF support again, not to extract tables for now (though there is tabula-rs), but to summarize the content for the Description field and suggest tags.

jqnatividad avatar May 19 '23 18:05 jqnatividad