docassemble-ALWeaver icon indicating copy to clipboard operation
docassemble-ALWeaver copied to clipboard

We should create a way for people to mark tables and rows in PDFs and Docx files

Open nonprofittechy opened this issue 4 years ago • 5 comments

perhaps adding table_name_row1, etc.

or just name_row1

nonprofittechy avatar Dec 15 '20 18:12 nonprofittechy

This is a giant task but would be a huge improvement to the Weaver. #299 envisions auto-detection of fields in a PDF and would solve part of this if we can reliably detect tabular data; AWS Textract is really good at this, so our only question is whether the algorithms are documented in public research :)

nonprofittechy avatar Oct 01 '21 20:10 nonprofittechy

Note that we should be able to parse a DOCX table and grab the first row as a header row.

nonprofittechy avatar Apr 11 '22 13:04 nonprofittechy

I think this is redundant with an issue that I can't find, but one way we should consider to use the Weaver to give a hint that two fields are related is to use a + symbol.

Maybe:

  • table0+column?

nonprofittechy avatar Sep 06 '22 17:09 nonprofittechy

If we use a label, we'd probably need the table, the row, and the column, as well as the variable name at the start so we can do the regular parsing as well, such as for child1.name.first. Might this be simpler for the developer if we put this in the Weaver UI itself? Maybe they'd request a table and be prompted to select number of cols/rows and which variables to put in which places in the table.

plocket avatar Sep 07 '22 13:09 plocket

It might be something we let people fix in the Weaver UI, but FormFyxer has to provide an initial label.

The example above has a label (table is just the placeholder), a row (0), and a column name.

nonprofittechy avatar Sep 07 '22 13:09 nonprofittechy