docassemble-ALWeaver
docassemble-ALWeaver copied to clipboard
We should create a way for people to mark tables and rows in PDFs and Docx files
perhaps adding table_name_row1, etc.
or just name_row1
This is a giant task but would be a huge improvement to the Weaver. #299 envisions auto-detection of fields in a PDF and would solve part of this if we can reliably detect tabular data; AWS Textract is really good at this, so our only question is whether the algorithms are documented in public research :)
Note that we should be able to parse a DOCX table and grab the first row as a header row.
I think this is redundant with an issue that I can't find, but one way we should consider to use the Weaver to give a hint that two fields are related is to use a + symbol.
Maybe:
table0+column?
If we use a label, we'd probably need the table, the row, and the column, as well as the variable name at the start so we can do the regular parsing as well, such as for child1.name.first. Might this be simpler for the developer if we put this in the Weaver UI itself? Maybe they'd request a table and be prompted to select number of cols/rows and which variables to put in which places in the table.
It might be something we let people fix in the Weaver UI, but FormFyxer has to provide an initial label.
The example above has a label (table is just the placeholder), a row (0), and a column name.