ReCiter
ReCiter copied to clipboard
Update the way ReCiter handles books
Scope
Approximately 0.1% of records in PubMed are for books although this has increased in the past year.
Data model
Books have a different data model. Key differences include:
Description | Book XML Attribute | Journal Article XML Attribute |
---|---|---|
Publication Type | <PublicationType>Book [Chapter]</PublicationType> |
<PublicationType>Journal Article</PublicationType> |
Source Title | <BookTitle> |
<JournalTitle> |
Identifier (ISBN/ISSN) | <ISBN> |
<ISSN> |
Publisher | <Publisher> |
N/A |
Publication Place | <PlaceOfPublication> |
N/A |
Authors | <AuthorList><Author>...</Author></AuthorList> |
same |
Editors (for books) | <EditorList><Editor>...</Editor></EditorList> |
N/A |
Pagination | <PageRange> (especially for chapters) |
<MedlinePgn> |
Publication Frequency | N/A | Could be inferred from <JournalIssue><PubFrequency> |
DOI | <ELocationID EIdType="doi">...</ELocationID> |
same |
Abstract | <AbstractText>...</AbstractText> (sometimes omitted) |
same |
Effect
The inconsistent data model causes chaos. For example, for personIdentifier = tme2002 and PMID = 34818336 (see also API), the wrong authors are listed. What probably is occurring is that the author list if shifting by one.
Another example: mtoth and 21204454: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=21204454&retmode=xml
Options
-
Update ReCiter PubMed Retrieval Tool not to return books. We could do this like so:
cole c[au] NOT (booksdocs[Filter])
-
Update data model across the projects to handle books:
- ReCiter PubMed Retrieval Tool
- ReCiter
- ReCiterDB
- ReCiter Publication Manager
- Exclude books from ReCiter Feature Generator and Article Retrieval output.
- Approach 1: Exclude cases where PublicationType = Book [Chapter], or
- Approach 2: Require
JournalTitle
attribute - Include a flag in application.properties to exclude books
I'm not sure this is still an issue.