cragwolfe
cragwolfe
> As for the unused content_type is this expected behavior? I noticed in similar function partition_multiple_via_api it takes content_types argument and in there this argument is used. No, it should...
I created an issue https://github.com/hwchase17/langchain/issues/1944 which would allow passing the `User-Agent` header, or any others headers that might be needed.
See this PR for how to pass a `User-Agent` header to UnstructuredURLLoader: https://github.com/hwchase17/langchain/pull/2105
This was an issue with `unstructured==0.13.1` but should be fixed as of 0.13.2 , initially tracked here: https://github.com/Unstructured-IO/unstructured/issues/2855 .
If you are working with unstructured output, a `Table` element has a `metadata.text_as_html` field which you could read into a pandas dataframe (google "html to pandas dataframe").
@stdweird , thanks for the contribution! Do you have an html doc handy that this PR fixes, which could get added to unittests?
https://github.com/Unstructured-IO/unstructured/actions/runs/8850164463/job/24304063524 ``` =========================== short test summary info ============================ FAILED test_unstructured/partition/pdf_image/test_pdf.py::test_partition_pdf_word_bbox_not_char - assert 18 == 17 + where 18 = len([, , , ...]) = 1 failed, 2231 passed, 13 skipped,...
@heya5 , can you describe what you had in mind after this element class is added?
@LucasOliveira44 , thanks for submitting this issue. > It would be nice to have an option or feature that allows me to control the behavior of chunking when encountering Table...
Is there something that needs to be fixed by this PR? What would it enable in the future if it is not fixing anything? As it stands, imo this harms...