data.jsdelivr.com icon indicating copy to clipboard operation
data.jsdelivr.com copied to clipboard

Wrong or insufficient result on API get /v1/lookup/hash/{hash}

Open Bladelol opened this issue 9 months ago • 12 comments

Description

Description: When calling the API-Endpoint /v1/lookup/hash/{hash} with the filehash of jquery 3.7.1 i get back a wrong result or at least not the npm result.

Steps to Reproduce: To make sure we got the right filehash we will call anothre API-Endpoint for query 3.7.1: https://data.jsdelivr.com/v1/packages/npm/[email protected]

Result(partly):

{ "type": "file", "name": "jquery.min.js", "hash": "/JqT3SQfawRcv/BIHPThkBvs0OEvtFFmqPF/lYI/Cxo=", "size": 87533 }

Base64 -> Hex: hash=fc9a93dd241f6b045cbff0481cf4e1901becd0e12fb45166a8f17f95823f0b1a

Use this hash in the other endpoint: https://data.jsdelivr.com/v1/lookup/hash/fc9a93dd241f6b045cbff0481cf4e1901becd0e12fb45166a8f17f95823f0b1a

Result:

{ "type": "gh", "name": "madmaxchow/VLOOK", "version": "master", "file": "/docs/js/jquery.js" }

So the Result points to the identical file in some other GitHub Repository in the docs section instead of npm source

Expected behavior: Search via API with a filehash e.g. https://data.jsdelivr.com/v1/lookup/hash/fc9a93dd241f6b045cbff0481cf4e1901becd0e12fb45166a8f17f95823f0b1a and get back the npm source or atleast all occurences

Actual Behavior: Search via API with a filehash e.g. https://data.jsdelivr.com/v1/lookup/hash/fc9a93dd241f6b045cbff0481cf4e1901becd0e12fb45166a8f17f95823f0b1a and get back source in random GitHub repository

Affected jsDelivr links

https://data.jsdelivr.com/v1/lookup/hash/fc9a93dd241f6b045cbff0481cf4e1901becd0e12fb45166a8f17f95823f0b1a https://data.jsdelivr.com/v1/packages/npm/[email protected]

Response headers

accept-ranges: bytes access-control-allow-origin: * access-control-expose-headers: * age: 300684 alt-svc: h3=":443";ma=86400,h3-29=":443";ma=86400,h3-27=":443";ma=86400 cache-control: public, max-age=31536000, stale-while-revalidate=86400, stale-if-error=86400 cf-cache-status: DYNAMIC cf-ray: 92713a82bf0971c5-FRA content-encoding: br content-length: 86 content-type: application/json; charset=utf-8 cross-origin-resource-policy: cross-origin date: Tue, 08 Apr 2025 14:13:13 GMT etag: W/"63-TDFDeXPyw9SXIRq67li0Y32jaIk" rndr-id: 1d8d7af3-2eb2-4c1e server: cloudflare timing-allow-origin: * vary: Accept-Encoding, Accept-Encoding via: 1.1 varnish x-cache: HIT x-render-origin-server: Render x-response-time: 8ms x-robots-tag: noindex x-served-by: cache-fra-eddf8230141-FRA x-timer: S1744121593.995989,VS0,VE4

Information

  • Device OS: Ubuntu 22
  • Commandline
  • Your location: Germany

Requisites

  • [x] I performed a cursory search of the issue tracker to avoid opening a duplicate issue.
  • [x] I checked the documentation to understand that the issue I am reporting is not normal behavior.
  • [x] I understand that not filling out this template correctly will lead to the issue being closed.

Additional content

No response

Bladelol avatar Apr 09 '25 17:04 Bladelol

Hey, we'll consider possible improvements here, but the current behavior matches the documentation:

Allows a reverse lookup of a file at the CDN by its hash. Works only for files which were accessed at least once. If there are multiple files with the same hash, only the one which was accessed first via the CDN is returned.

Can you please better describe your use case?

MartinKolarik avatar Apr 09 '25 18:04 MartinKolarik

Hey, i am happy to describe my use case here. I get artifacts from various sources that were created without package.json or similar. Now I need the exact dependencies for the creation of an SBOM. This process is incredibly time-consuming and difficult without this search via the file hash (and above all cannot be automated).

Bladelol avatar Apr 10 '25 08:04 Bladelol

We could maybe add an option to list all packages that have the file instead of returning just one. But then you'd need to somehow select the "right" one, which might still be hard (there are 20+ matches in this case).

MartinKolarik avatar Apr 10 '25 12:04 MartinKolarik

Tell me if i am wrong, but if i only want to look at npm first, this should be unique right?

Bladelol avatar Apr 10 '25 12:04 Bladelol

No, there can be several matching npm packages as well.

MartinKolarik avatar Apr 10 '25 12:04 MartinKolarik

Okay, than it could be hard to figure out the right package, but should be possible somehow. Could you provide me with an example? Maybei couldfigure out if we would be able to use a full list in our usecase.

Bladelol avatar Apr 10 '25 12:04 Bladelol

Here sample results for fc9a93dd241f6b045cbff0481cf4e1901becd0e12fb45166a8f17f95823f0b1a

gh,madmaxchow/VLOOK,master,/docs/js/jquery.js
gh,mixice/uigg,master,/js/jquery.min.js
gh,appotry/hexo,master,/libs/jquery/jquery.min.js
gh,jquery/jquery-dist,main,/dist/jquery.min.js
npm,jquery,3.7.1,/dist/jquery.min.js
gh,jquery/jquery,3.7.1,/dist/jquery.min.js
gh,jquery/jquery-dist,3.7.1,/dist/jquery.min.js
gh,fxxk3rrth4ng/utils,master,/jquery.js
gh,yi-yunseok/Yi-Yunseok.github.io,master,/assets/js/vendors/jquery-3.7.1.min.js
gh,jquery/jquery,f79d5f1a337528940ab7029d4f8bbba72326f269,/dist/jquery.min.js
npm,beefup,1.4.11,/dist/js/jquery.min.js
gh,willsofts/will-asset,1.0.0,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.1,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.2,/jquery/jquery-3.7.1.min.js
npm,jquery,3,/dist/jquery.min.js
gh,willsofts/will-asset,1.0.3,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.4,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.5,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.6,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.7,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.8,/jquery/jquery-3.7.1.min.js
npm,beefup,1.4.12,/dist/js/jquery.min.js
gh,willsofts/will-asset,1.0.9,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.10,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.11,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.12,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.13,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.14,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.16,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.17,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.18,/jquery/jquery-3.7.1.min.js
npm,@liaojie1314/blog-static,1.0.0,/js/jquery3.7.1/jquery.min.js
npm,beefup,1.4.13,/dist/js/jquery.min.js
npm,webfast,0.1.25,/content/js/jquery-3.7.1.min.js
npm,webfast,0.1.24,/content/js/jquery-3.7.1.min.js
npm,webfast,0.1.28,/content/js/jquery-3.7.1.min.js
npm,webfast,0.1.21,/content/js/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.19,/jquery/jquery-3.7.1.min.js
gh,AxKuu/jquery,3.7.1,/jquery-3.7.1.min.js
gh,davidjbradshaw/iframe-resizer,4.3.11,/example/jquery-3.7.1.min.js
npm,bsseond,1.0.0,/jquery-3.7.1.min.js
gh,jd82k/Joe,1.2.1,/assets/libs/jquery/jquery.min.js
gh,bonafide-ngo/jquery,3.7.1b0,/dist/jquery.min.js
npm,beefup,1.4.14,/dist/js/jquery.min.js
gh,multitalentedman/responsive-blog-design,275cb44ca8ab3246e9f3690b81796f7531085f12,/shared/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.21,/jquery/jquery-3.7.1.min.js
npm,liutsnpm,1.0.1,/Joe-assets/assets/libs/jquery/jquery.min.js
npm,liutsnpm,1.0.2,/Joe-assets/assets/libs/jquery/jquery.min.js
gh,huxubo/CDN,0.0.1,/libs/jquery/3.7.1/jquery.min.js
gh,willsofts/will-asset,1.0.23,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.24,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.26,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.27,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.28,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.29,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.31,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.33,/jquery/jquery-3.7.1.min.js
gh,appotry/hexo,10.9,/libs/jquery/jquery.min.js
gh,appotry/hexo,10.13,/libs/jquery/jquery.min.js
gh,appotry/hexo,10.14,/libs/jquery/jquery.min.js
gh,usdos-cgfs/audit-tool,1.0.1,/lib/jquery-3.7.1.min.js
npm,abeamer,1.7.2,/client/lib/js/vendor/jquery-3.7.1.min.js
gh,usdos-cgfs/audit-tool,1.0.2,/lib/jquery-3.7.1.min.js
gh,UniversitaDellaCalabria/unicms-template-italia,1.3.1,/src/unicms_template_italia/static/js/jquery.3.7.1.min.js
gh,appotry/hexo,10.18,/libs/jquery/jquery.min.js
gh,appotry/hexo,10.20,/libs/jquery/jquery.min.js
gh,appotry/hexo,10.21,/libs/jquery/jquery.min.js
gh,appotry/hexo,10.22,/libs/jquery/jquery.min.js
gh,appotry/hexo,10.23,/libs/jquery/jquery.min.js
gh,appotry/hexo,10.25,/libs/jquery/jquery.min.js
gh,James-JohnsonBE/demo,1.0.2,/lib/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.37,/jquery/jquery-3.7.1.min.js
gh,eyeofchaos/eocjsNewsticker,0.7.3,/jquery-3.7.1.min.js
gh,UniversitaDellaCalabria/unicms-template-italia,1.3.2,/src/unicms_template_italia/static/js/jquery.3.7.1.min.js
gh,UniversitaDellaCalabria/unicms-template-italia,1.3.3,/src/unicms_template_italia/static/js/jquery.3.7.1.min.js
gh,willsofts/will-asset,1.0.38,/jquery/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.39,/jquery/jquery-3.7.1.min.js
gh,django/django,5.1.4,/django/contrib/admin/static/admin/js/vendor/jquery/jquery.min.js
npm,beefup,1.4.15,/dist/js/jquery.min.js
npm,beefup,1.5.0,/dist/js/jquery.min.js
npm,qexo-static,3.0.4,/qexo/jquery/jquery.min.js
npm,qexo-static,3.0.5,/qexo/jquery/jquery.min.js
npm,qexo-static,3.0.2,/qexo/jquery/jquery.min.js
npm,qexo-static,3.0.3,/qexo/jquery/jquery.min.js
npm,qexo-static,3.0.1,/qexo/jquery/jquery.min.js
gh,django/django,5.0,/django/contrib/admin/static/admin/js/vendor/jquery/jquery.min.js
npm,jsxgraph,1.10.1,/distrib/docs/static/jquery.min.js
gh,janssenproject/jans,76e0414143c7d3df6285566983fbd06f967fb715,/jans-casa/plugins/bioid/extras/agama/web/js/jquery-3.7.1.min.js
gh,janssenproject/jans,43f18a79de5b5d673456d0f66720771692760d03,/jans-casa/plugins/bioid/extras/agama/web/js/jquery-3.7.1.min.js
gh,willsofts/will-asset,1.0.40,/jquery/jquery-3.7.1.min.js
gh,django/django,5.1.6,/django/contrib/admin/static/admin/js/vendor/jquery/jquery.min.js
gh,lkzhao/Hero,1.6.4,/docs/docsets/Hero.docset/Contents/Resources/Documents/js/jquery.min.js
gh,lkzhao/Hero,1.6.4,/docs/js/jquery.min.js
npm,tornado-cdn,2.0.3,/jquery.min.js
npm,jquery.flipster,1.1.6,/demo/jquery.min.js
gh,emasgp/js,c21f6b197c3a4b4e281c22da043819479ffe53e0,/jquery-3.7.1min.js
gh,geonetwork/core-geonetwork,ae96904af72a3fef40fb8deeded6f56d2d28b746,/web-ui/src/main/resources/catalog/lib/jquery-3.7.1.min.js

MartinKolarik avatar Apr 10 '25 12:04 MartinKolarik

You are totally right, to differentiate between some f them will be pretty hard :(

Bladelol avatar Apr 10 '25 12:04 Bladelol

okay i think i found a solution for my usecase if you could provide an endpoint with the full list. I can filter for npm entries. After that i just have to lookup via npm the release timestamp of that explicit version and take the oldest one.

Bladelol avatar Apr 11 '25 09:04 Bladelol

@MartinKolarik should i close this bug-issue (or better creator-cant-read-issue xD) and make a feature request?

Bladelol avatar Apr 16 '25 08:04 Bladelol

No need to create a new issue, I'll take a look at this when I get some time.

MartinKolarik avatar Apr 16 '25 13:04 MartinKolarik

We could maybe add an option to list all packages that have the file instead of returning just one.

@MartinKolarik Personally I would find that useful/interesting; more so than just getting a single match for whatever the first source that happened to be accessed was.

It would also be potentially useful to be able to filter those down with a type param (similar to what other API endpoints have that let me specify npm / gh / etc)

I'm not sure how much this would complicate things, but maybe that could also be sorted by some kind of 'popularity' measure like download stats/etc.

For the example use case, I could probably guess that the intended version was the main npm version; but for other libraries I might not be as easily able to identify what the 'main canonical source' might be for that file; which is where I might be able to use the 'popularity' to help narrow it down (eg. if most of those results have 100/1000's of downloads, and then the main one has 1,000,000's of downloads; I could make a solid guess)

A more complicated idea (that I'm not even sure if it would be viable), but that maybe could help figure out the 'canonical' version better, might be to:

  • lookup the hash and get a list of matching projects that have that file
  • for each of those projects, check the package.json / similar to see if this file is included in the main exports for that package

It looks like this is the file that handles the lookup:

https://github.com/jsdelivr/data.jsdelivr.com/blob/46389da6bcbf5ae5c153e03ba7e00cbfd2e5299b/src/routes/v1/LookupRequest.js#L6-L8

Which then calls into this to do the actual lookup:

https://github.com/jsdelivr/data.jsdelivr.com/blob/46389da6bcbf5ae5c153e03ba7e00cbfd2e5299b/src/models/File.js#L47-L59

So maybe to sort by downloads it could join against something like (from a quick/naive skim):

  • https://github.com/jsdelivr/data.jsdelivr.com/blob/master/src/models/PackageHits.js
  • https://github.com/jsdelivr/data.jsdelivr.com/blob/master/src/models/PackageVersionHits.js
  • https://github.com/jsdelivr/data.jsdelivr.com/blob/master/src/models/FileHits.js
  • etc

0xdevalias avatar Jun 18 '25 05:06 0xdevalias