John
John
Minor/partial refactor of interfaces.py and add tests
Per discussion here (https://github.com/Unstructured-IO/unstructured/pull/1259/files#r1312235977), `add_pytesseract_bbox_to_elements` can be improved by using `pytesseract.image_to_data` and vector math to find the coordinates of elements.
Currently `partition_json` is intended only for deserializing the unstructured JSON outputs/elements and is not included as a file format we accept for partitioning ([see here](https://unstructured-io.github.io/unstructured/bricks.html)). The goal of this issue...
The pinned version of unstructured-client was changed from `>=0.15.1` to `
Stemming from conversations [here](https://github.com/Unstructured-IO/unstructured/pull/1627#discussion_r1346498859) and [here](https://github.com/Unstructured-IO/unstructured/pull/1652), it would be worthwhile to do our own comparison of language detection packages to see which is best for detecting the language of short...
Best to review commit by commit. This PR is the first for cleaning up the partition params. Fixes in this PR: - Move non-partitioner modules to `unstructured/partition/utils/` - Note that...
The ability to set which OCR Agent should be used was added [here](https://github.com/Unstructured-IO/unstructured/pull/2462/files), but there is no documentation describing this (example of [user asking if this can be done](https://github.com/Unstructured-IO/unstructured/issues/2958))
**Is your feature request related to a problem? Please describe.** `partition_via_api` does not accept arguments for defining the retry logic to be used by the python client. This means `partition_via_api`...
I installed `pyonenote` via `pip install pyonenote` and tried running `pyonenote -f example-docs/QuickNotes.one` and got the following error: ``` Traceback (most recent call last): File ".../pyonenote", line 8, in sys.exit(main())...