Is there a way to parse multiple tables using "lines"
I am having 2 tables in a single page of the pdf and want to extract data from both of them. Can't use the "tables" plugin as can have only the first line of data using that.
As far as I can understand the "lines" block is treated as a dictionary. So multiple table parsing using "lines" is not possible. Is there an alternative?
I can implement this but I need invoice2data maintainers to help me to decide on YAML syntax for it. I have two suggestions.
@m3nu: can you comment on below ideas, please?
Use extended fields syntax
In the #307 I suggested each fields entry to be an associative array. That would allow very clean support for requested feature (without breaking backward compatibility), consider:
fields:
foo:
static: 'Lorem ipsum'
items:
plugin: lines
settings:
start: ...
end: ...
line: ...
rates:
plugin: lines
settings:
start: ...
end: ...
line: ...
That would require rewriting plugins API a bit which I can easily handle. The only problem I see is that table plugin wouldn't match that design. It's because table plugin parses (returns) multiple fields. It means we may need two APIs for plugins then (which is not a problem for me - I'm just making it clear).
Extend existing lines syntax
Current syntax for lines looks like this:
lines:
start: ...
end: ...
line: ...
We could extend it to support following (without breaking backward compatibility):
lines:
- items:
start: ...
end: ...
line: ...
- rates:
start: ...
end: ...
line: ...
Instead of this I was thinking if the lines section could be made like the tables section. Where we could give multiple entries. That would be really helpful
Instead of this I was thinking if the lines section could be made like the tables section. Where we could give multiple entries. That would be really helpful
The difference between above "Extend existing lines syntax" and tables plugin syntax is the former having every array entry named (items and rates). We can't have pure tables-like syntax instead. It's because:
linesplugin returns array that has to be assigned to some fieldtablesplugin assigns to few fields depending on usedbody
Unless I misunderstood you. If so, please provide some syntax example, so it's clear what you mean.
Use extended fields syntax
Defining this per-field makes more sense to me personally and seems more scalable. So we would have options for each field: regex, static, lines plugin, etc. Maybe we treat everything as plugin, including regex and static, so those can be improved independently.