meltano icon indicating copy to clipboard operation
meltano copied to clipboard

Telemetry - Plugin Version

Open pnadolny13 opened this issue 2 years ago • 6 comments

It would be nice to detect when a version bump in a plugin is causing lots of users failures and could potentially warrant a warning message to the community.

Since we're very careful about hashing pip_urls it becomes hard to determine the version on the warehouse side. We will use a reverse hashing of known values to match known packages but for pip urls this will be more difficult unless we have a look up for all package versions and lots of singer plugins are pinned with commit hashes so that wouldnt be worth it. Also some plugins have a list of pip dependencies.

We talked about possibly parsing those out into an array. How could we solve this problem in a way where we get versions but also ensure no PII is captured?

cc @aaronsteers @tayloramurphy

pnadolny13 avatar Jun 08 '22 15:06 pnadolny13

Parsing into an array should work in theory.

The for each item in the array we could check for a version constraint signal like = or @ and hash the left side without the suffix. If splitting on the delimiter results in something like a semver indicator, return in cleartext. Otherwise, return the right side as a hash.

aaronsteers avatar Jun 08 '22 15:06 aaronsteers

@aaronsteers how would this work when pip_url is literally just the name of the package? Are we setting the installed version in the lockfile when users do an install? Seems like we'd want it from that if possible as well.

tayloramurphy avatar Jun 14 '22 17:06 tayloramurphy

@aaronsteers ping on the above question. @WillDaSilva you will likely have some insight as well.

tayloramurphy avatar Aug 03 '22 21:08 tayloramurphy

@aaronsteers how would this work when pip_url is literally just the name of the package?

@tayloramurphy yeah, that's possible and would not help us knowing which version of the package the user installed.

Are we setting the installed version in the lockfile when users do an install? Seems like we'd want it from that if possible as well.

No, we're not. A lock file should point to a specific version, but I'm not sure it should be set after users do an install, rather before. So, either meltano add/lock resolves the pip URL to the right set of exact versions (latest version in PyPI, latest git tag/commit, etc.) or the Hub does before Meltano even pulls the definition (i.e. https://github.com/meltano/hub/issues/219).

edgarrmondragon avatar Aug 03 '22 23:08 edgarrmondragon

@edgarrmondragon, @pnadolny13 and @tayloramurphy - I don't know enough about how private python repos are targeted - and specifically if there's any risk of some token being part of the reference or the URL.

Do you know?

One path to resolving this is to figure out which library naming patterns we don't want in cleartext, and then allow the others.

aaronsteers avatar Aug 04 '22 17:08 aaronsteers

@aaronsteers @edgarrmondragon @pnadolny13 @tayloramurphy Reading the pip_url is risky, since it can contain all kinds of private information, and can be anything one would pass to pip install on the command line. Perhaps instead of messing with it, we can augment the environment context to include a list of:

[
    [<hashed package name>, <cleartext version from `importlib.metadata.version`>],
    [<hashed package name>, <cleartext version from `importlib.metadata.version`>],
    ...
]

for all installed third-party packages within the currently active Python environment. Then we can compare the hashed package names against a list of known hashes.

The problem with this approach is that it relies on the plugins being in the same Python environment, which I believe is currently always the case, but may not be in the future.

WillDaSilva avatar Aug 04 '22 18:08 WillDaSilva