pubmed_parser icon indicating copy to clipboard operation
pubmed_parser copied to clipboard

Error using parse_pubmed_xml

Open sublimotion opened this issue 4 years ago • 5 comments

It looks like the pubmed parser doesn't support the pubmed baseline files?

I get the error below. It also doesn't look like the test file is using a similar file format.

pubmed_dict = pp.parse_pubmed_xml('./data/pubmed20n1015.xml') # dictionary output
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-2b4cea8c6fb9> in <module>
----> 1 pubmed_dict = pp.parse_pubmed_xml('./data/pubmed20n1015.xml') # dictionary output

~/anaconda3/envs/python3/lib/python3.6/site-packages/pubmed_parser/pubmed_oa_parser.py in parse_pubmed_xml(path, include_path, nxml)
    155         journal = ""
    156 
--> 157     dict_article_meta = parse_article_meta(tree)
    158     pub_year_node = tree.find(".//pub-date/year")
    159     pub_year = pub_year_node.text if pub_year_node is not None else ""

~/anaconda3/envs/python3/lib/python3.6/site-packages/pubmed_parser/pubmed_oa_parser.py in parse_article_meta(tree)
     67     """
     68     article_meta = tree.find(".//article-meta")
---> 69     pmid_node = article_meta.find('article-id[@pub-id-type="pmid"]')
     70     pmc_node = article_meta.find('article-id[@pub-id-type="pmc"]')
     71     pub_id_node = article_meta.find('article-id[@pub-id-type="publisher-id"]')

AttributeError: 'NoneType' object has no attribute 'find'

sublimotion avatar Oct 22 '20 19:10 sublimotion

Same issue here.

GanjinZero avatar Oct 29 '20 11:10 GanjinZero

I think the format of XML files is changed since the time these scripts were written. The XML structure and the attributes in the current PubMed data are completely different from the way it's processed here. I had to write an XML parser from scratch for my project.

kmnis avatar Nov 10 '20 05:11 kmnis

@mnis @GanjinZero @sublimotion thanks so much for the report! I haven't checked the script for a bit but would be great to check if the current script is in sync with the current MEDLINE baseline structure.

titipata avatar Nov 21 '20 06:11 titipata

I ran into the same issue as @sublimotion and @GanjinZero when running parse_pubmed_xml on an article downloaded last week, pubmed21n1298.xml.

dict_out = pp.parse_pubmed_xml('data/pubmed21n1298.xml') # errors

I can confirm parse_medline_xml() parses without errors and returns useful output. I believe the current script is in sync with the current MEDLINE baseline structure. I hope this helps @titipata.

dict_out = pp.parse_medline_xml('data/pubmed21n1298.xml')
pprint(dict_out[0])

OUTPUT:
{'abstract': 'BACKGROUND\n'
             'Drugs of abuse have a common property in mammals, which is their '
             'ability to facilitate the release of the neurotransmitter and '
             'neuromodulator dopamine in specific brain regions involved in '
             'reward and motivation. This increase in synaptic dopamine levels '
             'is believed to act as a positive reinforcer and to mediate some '
             'of the acute responses to drugs. The mechanisms by which '
             'dopamine regulates acute drug responses and addiction remain '
             'unknown.\n'
             '\n'
             '\n'
             'RESULTS\n'
             'We present evidence that dopamine plays a role in the responses '
             'of Drosophila to cocaine, nicotine or ethanol. We used a '
             'startle-induced negative geotaxis assay and a locomotor tracking '
             'system to measure the effect of psychostimulants on fly '
             'behavior. Using these assays, we show that acute responses to '
             'cocaine and nicotine are blunted by pharmacologically induced '
             'reductions in dopamine levels. Cocaine and nicotine showed a '
             'high degree of synergy in their effects, which is consistent '
             'with an action through convergent pathways. In addition, we '
             'found that dopamine is involved in the acute '
             'locomotor-activating effect, but not the sedating effect, of '
             'ethanol.\n'
             '\n'
             '\n'
             'CONCLUSIONS\n'
             'We show that in Drosophila, as in mammals, dopaminergic pathways '
             'play a role in modulating specific behavioral responses to '
             'cocaine, nicotine or ethanol. We therefore suggest that '
             'Drosophila can be used as a genetically tractable model system '
             'in which to study the mechanisms underlying behavioral responses '
             'to multiple drugs of abuse.',
 'affiliations': 'Department of Anesthesia, University of California San '
                 'Francisco, California 94143-0452, USA.',
 'authors': 'RJ Bainton;LT Tsai;CM Singh;MS Moore;WS Neckameyer;U Heberlein',
 'chemical_list': 'D000431:Ethanol; D009538:Nicotine; D003042:Cocaine; '
                  'D004298:Dopamine',
 'country': 'England',
 'delete': False,
 'doi': '10.1016/s0960-9822(00)00336-5',
 'issn_linking': '0960-9822',
 'journal': 'Current biology : CB',
 'keywords': '',
 'medline_ta': 'Curr Biol',
 'mesh_terms': 'D000818:Animals; D001522:Behavior, Animal; D003042:Cocaine; '
               'D004298:Dopamine; D004330:Drosophila; D000431:Ethanol; '
               'D008297:Male; D009538:Nicotine',
 'nlm_unique_id': '9107782',
 'other_id': '',
 'pmc': '',
 'pmid': '10704411',
 'pubdate': '2000',
 'publication_types': 'D016428:Journal Article; D013486:Research Support, U.S. '
                      "Gov't, Non-P.H.S.; D013487:Research Support, U.S. "
                      "Gov't, P.H.S.",
 'references': '',
 'title': 'Dopamine modulates acute responses to cocaine, nicotine and ethanol '
          'in Drosophila.'}

raypereda-gr avatar Jun 14 '21 18:06 raypereda-gr

Thanks @raypereda-gr! Is that possible to make the PR with the same file with a new structure of MEDLINE database? I can also take look into it further and change the test file accordingly.

titipata avatar Jun 14 '21 21:06 titipata