syft icon indicating copy to clipboard operation
syft copied to clipboard

pip cataloger should support repository url

Open sambhav opened this issue 3 years ago • 7 comments

What would you like to be added:

when pip packages are installed from non default pip indices (pypi), we should store the pip repository url in the sbom

Why is this needed: useful to know the origin of a package

Additional context:

sambhav avatar Dec 14 '21 01:12 sambhav

This is a great idea --do you happen to know if this information is stored for each package installed? That is, if looking at a site-packages directory with several installations, is it possible to locally conclude which local package was specifically pulled from which pip index?

wagoodman avatar Dec 14 '21 03:12 wagoodman

@wagoodman - sadly it looks like this information is not available. cc: @pradyunsg if you have any more details.

sambhav avatar Dec 14 '21 08:12 sambhav

This information is not stored in the metadata by pip.

The only way to get this is going to be possible is by controlling the pip install call, and checking what index URL it is using (likely by using pip config and PIP_INDEX_URL).

pradyunsg avatar Dec 14 '21 09:12 pradyunsg

@pradyunsg that might be tricky though right? pip might have installed it from one of the extra index urls or via find-links, some of which may also be project specific configuration rather than a global pip opt. Would it be worth opening an issue for pip to store this metadata? The rationale being that certain index servers might store different copies of the same package/version and we might want to identify the origin in the output SBOM/vuln analysis.

sambhav avatar Dec 14 '21 11:12 sambhav

You might want to check with pip-audit folks, who are generating SBOMs. If that doesn't go anywhere, filing an issue against pip seems reasonable to me!

pradyunsg avatar Dec 14 '21 11:12 pradyunsg

Created https://github.com/pypa/pip/issues/10736

sambhav avatar Dec 20 '21 18:12 sambhav

IIUC, we should consider this issue "blocked" until the data is available for Syft to observe in the scan target. If I'm wrong here, just let me know! 😄

luhring avatar May 21 '22 15:05 luhring