unstract
unstract copied to clipboard
incomplete extracted result
Hi All, we have a 7 pages pdf which is a delivery note and we would like to get the item information on it. There are 13 items but unstract only extract 6 items. I can use the prompt to get the total number of items, meaning all the pages are extracted.
But for the details, it cannot extract all the data. Here is my prompt:
Extract the following details from the text and format them into JSON:
Part Number: The value that appears immediately before "UPC:". Ensure it is not the value after "CPU:". (e.g., 960-001312, PC-LABEL, UCSC-C220-M6S)
Ship Qty
Order Qty
SKU
Description
Serial Numbers
Return the result in JSON format as an array of objects, each containing:
"part_number"
"order_qty"
"ship_qty"
"sku"
"description"
"serial_numbers"
item after "007" cannot be extracted. is there any limitation on the output size?
Here is the json output for the above prompt result.json
Yes @haluwong - The gpt-4 model is having an output token limit of 4096. You need to choose a model with higher output token limit.
@VikashPratheepan -- Curious, how do we handle data extraction that is larger than LLM model's output token limit? I mean most LLMs are going big in input size and not so much on Output.
@ashwanthkumar we handle this by internally splitting the context, making multiple requests and responding with a concatenated result. However, this feature is only available in the enterprise version.
I have the same problem. This should never happen since chatgpt simply asks you if you want to proceed. why is this feature not inbuild into the software? It makes it nearly useless for anything of useful size, perhaps apart from receipts and short bank statements.