Enable overriding undocumented telemetry identifier with PIP_TELEMETRY_USER_AGENT_ID
Fixes https://github.com/pypa/pip/issues/13038.
The commit https://github.com/pypa/pip/commit/f787788a65cf7a8af8b7ef9dc13c0681d94fff5f by @alex performed a PATH traversal and subsequent process execution (the output of rustc --version) in order to add some more information to the User-Agent request header that pip provides to its remote HTTP requests.
f787788a65cf7a8af8b7ef9dc13c0681d94fff5f makes reference to information valuable to package maintainers, which appears to be propagated to PyPI's BigQuery output: https://github.com/pypa/pip/pull/9987#issuecomment-1076569602 (thanks so much to Alex for explaining this to me -- I'm not sure why the commit message didn't have a link to his PR discussion).
I had previously proposed removing the rustc --version call, but that has been removed and this PR now just enables overriding the telemetry identifier, as in title.
A discussion of standardized telemetry data began to form in https://github.com/pypa/pip/issues/5499, but failed to graduate into a full PEP. Given the very high quality of other Python packaging standards, I would like to propose that this one also be made into a full-on standard, probably somewhere around the Simple Repository API page.
This PR does two things:
- Introduces
PIP_TELEMETRY_USER_AGENT_ID, an environment variable which completely overrides the value used to form pip'sUser-Agentheader. - Proposes at length (see docstring) how to expand telemetry functionality in a respectful and thoughtful approach, through Python packaging standards.
It is not accurate that commit was pushed directly to master, it was submitted as a PR and reviewed https://github.com/pypa/pip/pull/9987 (I'm not a pip committer and couldn't have pushed to master if I wanted)
I'm extremely sorry. I will delete that immediately. I pinged you because I found it very confusing given the other work I've seen from you.
I will review those PR comments now. I'm incredibly sorry once again and thanks so much for your very patient and thoughtful response.
Ok, I have deleted the revert and retained the changes of #9987. Thanks so much @alex for linking me there. This now just proposes a separate environment variable to override the User-Agent setting. cc @uranusjr @pradyunsg
I can't reproduce the test failure and I am confident my changes continue to produce valid JSON for User-Agent. Unsure what's going on.
@cosmicexplorer FYI I fixed a typo in your post githb.com -> github.com, I hope you don't mind, the former URL was wanting to install an extension in my browser. And I've added that it fixes https://github.com/pypa/pip/issues/13038.
The commit https://github.com/pypa/pip/commit/f787788a65cf7a8af8b7ef9dc13c0681d94fff5f by @alex performed a PATH traversal and subsequent process execution (the output of rustc --version) in order to add some more information to the User-Agent request header that pip provides to its remote HTTP requests.
https://github.com/pypa/pip/commit/f787788a65cf7a8af8b7ef9dc13c0681d94fff5f makes reference to information valuable to package maintainers, which appears to be propagated to PyPI's BigQuery output: https://github.com/pypa/pip/pull/9987#issuecomment-1076569602 (thanks so much to Alex for explaining this to me -- I'm not sure why the commit message didn't have a link to his PR discussion).
I had previously proposed removing the rustc --version call, but that has been removed and this PR now just enables overriding the telemetry identifier, as in title.
I don't really understand why the rustc call is being singled out as a security issue, there are many places where pip is calling subprocess.run, subprocess.check_call, and subprocess.check_output, both directly and via vendored code including, presumably, to work out the glib c and musl c versions.
While I see there being other motivations for wanting to override the telemetry part of the user agent string, I'm not sure I quite follow this reasoning. I wonder if it makes sense to appropriate https://github.com/pypa/pip/issues/13038 or make a new issue where we can discuss the reasons and design for overriding the telemetry, rather than the technical issues of the PR.
Overall, I am currently neither for or against this feature, but I will leave a quick review on an immediate technical issue I see with your code.