pybliometrics icon indicating copy to clipboard operation
pybliometrics copied to clipboard

ScienceDirect: Object Retrieval

Open nils-herrmann opened this issue 1 year ago • 3 comments
trafficstars

It seems that the only consistent way of identifying objects is by its eid which has following structure: <file_eid>-<object_ref>.<object_suffix> An example is 1-s2.0-S0893608024005562-si15.svg'

Therefore the most reliable strategy is to retrieve objects by passing the document identifier and object file name:

ObjectRetrieval('10.1016/j.neunet.2024.106632', filename='gr3.jpg')

To get the file names, users can use the ObjectMetadata class:

o_md = ObjectMetadata('10.1016/j.neunet.2024.106632')
filenames = [f['filename'] for f in o_md.results]

nils-herrmann avatar Oct 17 '24 09:10 nils-herrmann

How would users know the filename beforehand?

Michael-E-Rose avatar Oct 18 '24 09:10 Michael-E-Rose

There is a naming convention. All items are enumerated with a prefix/suffix depending on its type (figure, math formula, pdf):

  • Standard Figures are: gr<nr>.jpg
  • Formula: si<nr>.svg

Manually, there are two options:

  1. Use the ObjectMetadata class and get the filenames of all objects:
o_md = ObjectMetadata('10.1016/j.neunet.2024.106632')
filenames = [f['filename'] for f in o_md.results]
  1. Check the paper online and inspect the download link: https://ars.els-cdn.com/content/image/1-s2.0-S1566253524004342-gr2_lrg.jpg image

nils-herrmann avatar Oct 18 '24 09:10 nils-herrmann

Alright, then let's make the class work with the filename. I will include your hints in the documentation.

Michael-E-Rose avatar Oct 24 '24 10:10 Michael-E-Rose