unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

feat/when used requests,may need a kwargs to support requests special params like verify

Open qingdengyue opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe.

I have been digging into the Langchain code. The UNSTRUCTURED_API_URL appears to utilize partition_multiple_via_api within unstructured.partition.api detail ref:https://github.com/langchain-ai/langchain/issues/21488 The user is using a private URL, and its SSL is private, so it requires to skip SSL verification for requests. and now. partition_multiple_via_api no way to do that

Describe the solution you'd like It seems to me that we might consider providing **kwargs to support this scenario. requests support params: https://github.com/psf/requests/blob/2d5f54779ad174035c5437b3b3c1146b0eaf60fe/src/requests/api.py#L14

Describe alternatives you've considered Perhaps we could reuse **request_kwargs which contains request data. We can retrieve parameters from **request_kwargs and utilize them in requests. partition_multiple_via_api:https://github.com/Unstructured-IO/unstructured/blob/e4c895923d6f8d1bbfe7baa8abc47dbe833aaacc/unstructured/partition/api.py#L105

Additional context Add any other context or screenshots about the feature request here.

qingdengyue avatar May 11 '24 07:05 qingdengyue

The unstructured-python-client is the preferred way to call the API now. @awalker4 - do you know if the client supports turning off SSL verify?

Separately, added an issue internally to update the LangChain loaders to use the AP client instead of partition_via_api.

MthwRobinson avatar May 13 '24 13:05 MthwRobinson

We can pass a custom requests.Session to the unstructured-client like this. We can certainly add a flag to the Loader to set this up before the partition call.

awalker4 avatar May 16 '24 14:05 awalker4

Thanks! Please reference the link @awalker4 provided above if you need to use the API without SSL verification.

MthwRobinson avatar May 16 '24 15:05 MthwRobinson

if use partition_multiple_via_api or partition_via_api in unstructured.partition.api.there is no entry for client=http_client, like @awalker4 solution. so what should i do?

qingdengyue avatar Jun 19 '24 07:06 qingdengyue

Hi @qingdengyue - we recommend using the client library direction, and you can use the elements_from_json from the unstructured library to convert the JSON outputs to unstructured elements if you need.

MthwRobinson avatar Jun 19 '24 13:06 MthwRobinson