unstract incomplete extracted result

Hi All, we have a 7 pages pdf which is a delivery note and we would like to get the item information on it. There are 13 items but unstract only extract 6 items. I can use the prompt to get the total number of items, meaning all the pages are extracted.

But for the details, it cannot extract all the data. Here is my prompt:

Extract the following details from the text and format them into JSON:

Part Number: The value that appears immediately before "UPC:". Ensure it is not the value after "CPU:". (e.g., 960-001312, PC-LABEL, UCSC-C220-M6S)
Ship Qty
Order Qty
SKU
Description
Serial Numbers
Return the result in JSON format as an array of objects, each containing:

"part_number"
"order_qty"
"ship_qty"
"sku"
"description"
"serial_numbers"

screen_20240920_03

screen_20240920_05

item after "007" cannot be extracted. is there any limitation on the output size?

Here is the json output for the above prompt result.json

Sep 20 '24 05:09 haluwong

Yes @haluwong - The gpt-4 model is having an output token limit of 4096. You need to choose a model with higher output token limit.

Sep 20 '24 05:09 VikashPratheepan

@VikashPratheepan -- Curious, how do we handle data extraction that is larger than LLM model's output token limit? I mean most LLMs are going big in input size and not so much on Output.

Oct 15 '24 10:10 ashwanthkumar

@ashwanthkumar we handle this by internally splitting the context, making multiple requests and responding with a concatenated result. However, this feature is only available in the enterprise version.

Oct 21 '24 04:10 shuveb

I have the same problem. This should never happen since chatgpt simply asks you if you want to proceed. why is this feature not inbuild into the software? It makes it nearly useless for anything of useful size, perhaps apart from receipts and short bank statements.

Nov 05 '24 20:11 cwikio

unstract unstract copied to clipboard

incomplete extracted result

unstract
unstract copied to clipboard