google-cloud-php icon indicating copy to clipboard operation
google-cloud-php copied to clipboard

DocumentAI : invoice parser result method to retrieve fields

Open Firebird75 opened this issue 3 years ago • 1 comments

Context : using Document AI PHP library to parse an invoice and retrieve the fields detected by the API OCR engine with invoice parser.

Issue : I am not able to extract the fields from the response. Either there is no method available or the method isn't documented.

I have checked documentation here : https://googleapis.github.io/google-cloud-php/#/docs/cloud-document-ai/v0.1.2/documentai/readme

But I can't find the right method to extract fields described here : https://cloud.google.com/document-ai/docs/processors-list#processor_invoice-processor

I have used : $result = $operationResponse->getResult();

but then $result contains a huge list of things including the document itself and I can't find what I am looking for, even by doing a var_dump. Please, is there any method available or a description of the API result structure in the documentation?

Firebird75 avatar Apr 23 '22 08:04 Firebird75

Hello,

It would be good to understand if this is just a doc issue or a limitation in the current library ? Is this library actively supported ?

Thank you

Firebird75 avatar Aug 18 '22 08:08 Firebird75

Hi @Firebird75 Apologies for the late reply on this issue.

But I checked the samples for DocumentAi from the second link that you provided.

In the quickstart I can see the following snippet:

$response = $client->processDocument($name, [
    'rawDocument' => $rawDocument
]);

# Print Document Text
printf('Document Text: %s', $response->getDocument()->getText());

Likely you've already resolved this or circumvented this issue. But just keeping it here for anyone who sees this issue.

Closing this for now, but please feel to reopen this if there is still something that you need clarity over.

Thanks

saranshdhingra avatar Feb 01 '24 06:02 saranshdhingra

Hello @saranshdhingra

Thank you for taking the time to reply to this issue.

What I am missing is a more detailed explanation on how to get the fields found by the Document AI invoice processor.

Basically, the invoice processor has certainly a method to retrieve fields like invoice number, total price, quantity, etc... But I don't know what method to use or how to parse the results. The structure of the result isn't documented (or I haven't found it).

Thank you very much !

Firebird75 avatar Feb 01 '24 17:02 Firebird75

Sorry @saranshdhingra but I can't reopen the issue as you have closed it. I am not allowed to do that. Do you want me to open a new issue to track this please ?

Firebird75 avatar Feb 01 '24 17:02 Firebird75

Hi @Firebird75 I checked a little, but I couldn't find custom getters for specific fields.

What I did try is that you could get all the entities that a processor has parsed, using something like this:

$response = $client->processDocument($name, [
    'rawDocument' => $rawDocument
]);

foreach($response->getDocument()->getEntities() as $key => $entity) {
    $type = $entity->getType();
    $txt = $entity->getMentionText() ;
}

The full list of getters for an Entity can be found here.

If you have seen any reference of custom getters for each processor, please let me know.

Thanks.

saranshdhingra avatar Feb 06 '24 12:02 saranshdhingra

Hi. I'll be closing this for now.

Please feel free to reopen this if more conversation is needed.

saranshdhingra avatar Feb 27 '24 06:02 saranshdhingra