Identify language-specific packages (name/version)
When a runtime with a package manager is used, we should try to identify which packages are being used. For example, for Python, record the package name and version for site packages.
This could be done for:
- Python site packages
- Ruby gems
- R (maybe)
- Java (not interpreted, no run-time package manager, but JARs include version information)
For Python: pip freeze is able to identify installed packages. However this runs in the "target" interpreter, not ReproZip's. We should probably read from the filesystem instead.
pip uses distlib to do this, which cites a variety of PEPs:
-
PEP 241: replaced by PEP 314. Metadata 1.0 format for
PKG-INFOfile in sdists, and.dist-info/METADATAfiles - PEP 314: Metadata 1.1 format
- PEP 345: Metadata 1.2 format
- PEP 566: Metadata 2.1 format
-
PEP 376: On-disk layout of packages and metadata (
.dist-info,*.egg-info) - PEP 386: replaced by PEP 440. Version number and version requirements format, irrelevant
- PEP 426: meant to replace PEP 345, but withdrawn in favor of PEP 566
- PEP 440: Version number and version requirements format, irrelevant
So there doesn't really seem to be competing standards or formats. Reading .dist-info/METADATA or .egg-info/PKG-INFO (PEP 376) should give all the information we want (in PEP 566 format), though really the version number is in the folder name already.
I thought of highlighting a possible pitfall in this task. Some packages have a single release for Py2 and Py3, where certain features are made unavailable for Py2 users. But the dependency tracker, based on how you plan to implement it, might encounter an issue (such as with Sumatra). More here.
Looking forward to see this functionality implemented within ReproZip.
We should make sure to record the Python version as well then, thanks.
This needs a change in the config file format.
Currently it's a flat list of files, implicitly meant for whatever that distribution's default package manager is:
packages:
- name: "libc6"
version: "2.31-0ubuntu9.3"
size: 13563904
packfiles: true
meta: {"section": "libs"}
files:
# Total files used: 3.80 MB
# Installed package size: 12.94 MB
- "/lib/i386-linux-gnu/ld-2.31.so" # 176.40 KB
- "/lib/ld-linux.so.2" # Link to /lib/i386-linux-gnu/ld-2.31.so
- "/lib/x86_64-linux-gnu/ld-2.31.so" # 186.99 KB
- name: "libexpat1"
version: "2.2.9-1build1"
size: 410624
packfiles: true
meta: {"section": "libs"}
files:
# Total files used: 178.28 KB
# Installed package size: 401.00 KB
- "/lib/x86_64-linux-gnu/libexpat.so.1" # Link to /lib/x86_64-linux-gnu/libexpat.so.1.6.11
- "/lib/x86_64-linux-gnu/libexpat.so.1.6.11" # 178.28 KB
We can either add fields to each package stating which package manager & environment it's for, or make it a nested list environment->package:
packages:
- package_manager: dpkg
environment: /
packages:
- name: "libc6"
version: "2.31-0ubuntu9.3"
size: 13563904
packfiles: true
meta: {"section": "libs"}
files:
# Total files used: 3.80 MB
# Installed package size: 12.94 MB
- "/lib/i386-linux-gnu/ld-2.31.so" # 176.40 KB
- "/lib/ld-linux.so.2" # Link to /lib/i386-linux-gnu/ld-2.31.so
- "/lib/x86_64-linux-gnu/ld-2.31.so" # 186.99 KB
- package_manager: python
environment: /home/vagrant/venv
python: "3.8"
packages:
- name: "urllib3"
version: "1.26.4"
size: 12345
packfiles: true
files:
# Total files used: 678 KB
# Installed package size: 1.5 MB
- /home/vagrant/venv/lib/python3.8/site-packages/urllib3/response.py # 28 KB