unstructured
unstructured copied to clipboard
fix(xlsx): XLSX emits std minified .text_as_html
Summary
Eliminate historical "idiosyncracies" of table.metadata.text_as_html HTML introduced by partition_xlsx(). Produce minified .text_as_html consistent with that formed by chunking.
Additional Context
- XLSX
.text_as_htmlis minified (no extra whitespace or thead, tbody, tfoot elements). -
table.textis clean-concatenated-text (CCT) of table.