uspto-patent-data-parser icon indicating copy to clipboard operation
uspto-patent-data-parser copied to clipboard

Doc-number in application-reference overwrites doc-number in publication-reference

Open softwaregravy opened this issue 1 month ago • 1 comments

When parsing the bibliographical information, we just insert the keys

    invention_title = root_tree.find(invention_title_path)
    document_data = {}    
    if publication_info != None:
        publication_reference_info = {element.tag: element.text for element in list(publication_info)}
        document_data = {**document_data,**publication_reference_info}
    if application_info !=None:
        application_reference_info = {element.tag: element.text for element in list(application_info)}
        if application_info.attrib and application_info.attrib['appl-type']:
            application_reference_info['application_type'] =  application_info.attrib['appl-type']
        document_data = {**document_data,**application_reference_info}

source

An example patent might look like this (xml4)

<publication-reference>
<document-id>
<country>US</country>
<doc-number>09784948</doc-number>
<kind>B2</kind>
<date>20171010</date>
</document-id>
</publication-reference>
<application-reference appl-type="utility">
<document-id>
<country>US</country>
<doc-number>15067369</doc-number>
<date>20160311</date>
</document-id>
</application-reference>

The resulting dictionary lacks the patent id now, containing only the application id:

[{'bibliographic_information': {'country': 'US', 'doc-number': '15067369', 'kind': 'B2', 'date': '20160311', 'invention_title': 'xxx'}}]

softwaregravy avatar May 12 '24 15:05 softwaregravy