ceps icon indicating copy to clipboard operation
ceps copied to clipboard

CEP idea: Standardized user agent strings

Open jaimergp opened this issue 1 month ago • 5 comments

It's really tricky to get accurate usage or download data for given package versions, platforms, etc, that can inform decision making (e.g. dropping osx-64).

Different conda clients are using different user agent strings for their repodata and package downloads so, even if we could query those, we could only do so for conda requests. Other tools like mamba, pixi or rattler-build are not providing as much information. We would also need a mechanism for specific contexts to extend the user agent with custom values (e.g. conda-forge might want to flag their internal CI jobs so they don't add noise to the real user data).

For example, in conda, the lack of a standard way to do so results in runtime patches like this.

I propose two things:

References:

  • https://github.com/anaconda/anaconda-package-data/issues/64
  • https://github.com/conda/infrastructure/issues/1018
  • https://discuss.python.org/t/pre-pep-user-agent-schema-for-http-requests-against-remote-package-indices/104006

jaimergp avatar Nov 11 '25 12:11 jaimergp

I believe pip has a mechanism to add some telemetry information eg whether it runs in ci or not.

wolfv avatar Nov 11 '25 12:11 wolfv

I checked the pip code, this is what they do:

https://github.com/pypa/pip/blob/7e49dca9277bf4e325b85cfb9ebe70401f194fb6/src/pip/_internal/network/session.py#L109

baszalmstra avatar Nov 11 '25 12:11 baszalmstra

Yep, this bit particularly for CI detection heuristics. But that also would cover legitimate usage of CI in e.g. testing pipelines of other projects.

What we want to say is "this is a conda-forge build job" so it is passed to the build tool via e.g. --user-agent-data conda-forge/ci.

jaimergp avatar Nov 11 '25 13:11 jaimergp

FWIW as long as we stay close to https://datatracker.ietf.org/doc/html/rfc1945#page-46, I don't mind. That said, it's totally normal to also define other important request headers, if we want to stuff more information into requests. This is what https://github.com/anaconda/conda-anaconda-telemetry is doing to not overload the User-Agent header, remember that headers have a max length before some servers ignore them, cut them off or even respond with 500 responses if overflowing a certain size. IIRC @travishathaway did some digging into this when we build the HTTP version of conda-anaconda-telemetry.

jezdez avatar Nov 11 '25 13:11 jezdez

Hm, true, I like that option too. And conda already has a headers plugin hook, apparently. So we "only" need to standardize some of those decisions there (e.g. the ; separator for fields seen in anaconda-telemetry).

jaimergp avatar Nov 11 '25 13:11 jaimergp