k6 Implement support to collect Usage dynamically

What?

Implement support to collect Usage dynamically

Why?

Previously Usage collection happened in one place in a pull way. The usage report needed to get access to the given data and then pull the info from it and put it in.

This reverses the pattern and adds (if available) the cloud test run id to the usage report.

Future work can pull a bunch of the other parts of it out. For example:

used modules can now be reported from the modules
outputs can also report their usage
same for executors

Currently all the above are still done in the usage report code, but that is not necessary.

This also will allow additional usage reporting without the need to propagate this data through getters to the usage report, and instead just push it from the place it is used.

Allowing potentially reporting usages that we are interested to remove in a more generic and easy way.

Checklist

[ ] I have performed a self-review of my code.
[ ] I have added tests for my changes.
[ ] I have run linter locally (make lint) and all checks pass.
[ ] I have run tests locally (make tests) and all tests pass.
[ ] I have commented on my code, particularly in hard-to-understand areas.

Related PR(s)/Issue(s)

Aug 28 '24 07:08 mstoykov

A comment on why we can't reuse the already available OpenTelmetry dependency. Of course, we need to abstract it, but isn't possible to have the implementation mostly based on open telemetry primitives?

Maybe it is possible, I looked into the APIs now .... but for example stuff such as modules or outputs will require ... some changes or (ab)using the API in strange ways, especially if we want to keep the same end format we send.

In particular how are you going to get a slice of strings as a value from either a metric, a trace or a log(which arguably are the best candidate and also the one that isn't stable). In theory we can have module/k6 with value 1 and module/k6/http with value 1 and that just means that they were used. That definitely will work, but IMO doesn't really benefit us.

Arguably having traces, logs and or metrics that will not be useful to users but just for usage will either have to:

contaminate user data
have 2 different streams - one for usage, one for users. Which IMO will be confusing.

Any new additional package added as internal. If we plan to have it as a public API, we might graduate it after, but I would start with it only internally available.

I am in theory not against this, but part of the whole idea here was for this to be also usable in extensions. And especially with the possibility of some of what we consider core functionality moving to extensions or at separate repos, or even the ones that are in separate repos - browser for example.

The idea behind this change is that instead of k6 core and the report itself knowing and caring about all the things we want to report on usage and then pull them. It will get things pushed to it and then it will use it.

I am not strictly against putting this in internal, but I think we will probably want it moved very soon after, so not certian it will help much :shrug:

Aug 28 '24 15:08 mstoykov

Hey @mstoykov, thanks for the proposal! I haven't reviewed the PR in-depth yet, cause I guess there's some polishing still remaining. But here are my 2cts on the proposal:

Generally speaking, I like this reversed approach. I remember myself (among others) doing similar kind of reversal in Grafana codebase, and I remember that being very positive for decoupling the whole usage data collection from the rest of the services. Subtleties apart, I think this is the way to go.

That said,

Maybe it is possible, I looked into the APIs now .... but for example stuff such as modules or outputs will require ... some changes or (ab)using the API in strange ways, especially if we want to keep the same end format we send.

In particular how are you going to get a slice of strings as a value from either a metric, a trace or a log(which arguably are the best candidate and also the one that isn't stable). In theory we can have module/k6 with value 1 and module/k6/http with value 1 and that just means that they were used. That definitely will work, but IMO doesn't really benefit us.

Arguably having traces, logs and or metrics that will not be useful to users but just for usage will either have to:

contaminate user data

have 2 different streams - one for usage, one for users. Which IMO will be confusing.

I agree , I think we should clearly differentiate what's k6 observability (logs, metrics and traces), meant to be used for whoever that operates k6 - either us (cloud) or users (on their computers/infra); from usage data collection, only used by us, to back our decisions on data.

Despite the partial overlap, I think long-term it will be better to keep both separate, to make our lives easier.

I am in theory not against this, but part of the whole idea here was for this to be also usable in extensions. And especially with the possibility of some of what we consider core functionality moving to extensions or at separate repos, or even the ones that are in separate repos - browser for example.

The idea behind this change is that instead of k6 core and the report itself knowing and caring about all the things we want to report on usage and then pull them. It will get things pushed to it and then it will use it.

I am not strictly against putting this in internal, but I think we will probably want it moved very soon after, so not certain it will help much 🤷

I can see both of your arguments - my only concern here is the common tradeoff between breaking changes on a exposed API vs endless iterations towards a perfect API, so I'd suggest to try to ship a decently good, exposed, API for usage data collection asap and iterate over the next couple of releases before v1.0.

Perhaps we should try to use it from some of the extensions, so we can try to early identify potential breaking changes and/or unsatisfied needs from the API.

Aug 29 '24 09:08 joanlopez

Seems like there is agreement on the general idea, so let's try to make this change happen with as little future regrets as possible.

Currently I identify 3 API questions that can't be fixed with "we will add it later":

String and Strings

The current API having Strings and String, exist in order to support making the whole report only through the methods and have the same layout and types. Removing String is possible if we decide that.

The top level string values currently existing will not be done through this API but just added to the map before returning
cloud/test_run_id (or whatever it ends up being called) is a slice of string - usualyl with one value. In theory you can do -o cloud -o cloud twice on k6 and ... it will create two test runs actually.

Or if we are okay with changing the layout:

We can remove it have the top level string fields also be slices.

or

We can have the backend support things being either slices or a single string. For example modules or outputs might have only a single value, and without Strings it will be a single value, but if there are multipel both Strings and String will end up with it being a slice.

I personally feel like 1 seems like the best option - it has the least amount of trouble and removes 1 of the methods here while allowing us to return it if we want. Three of the fields are actually constants we have access to either way and hte last one is from the executionState which might not be the worst thing to keep around.

Count currently takes int64

This does allow someone to potentially decrease a counter ... which seems strange on second iteration. It might be better to Rename it to Uint64 and change the type.

The map that is returned currently supports only 1 level of grouping with `/`

Arguably we can remove the grouping, again it will mean changes for the output this time due to the executors key.

I kind of like the one level though and it is part of hte golang telemetry.

I didn't really want to make it work for multiple levels, but that might be a good idea as well.

Arguably this could be done in hte backend service but it might add too much processing required there, so I prefer to keep it in the actual code here.

Error/log something else on coflicts

I kind of prefer to just return an error and kick this down to the user of the API - I expect this will practically never happen.

Otherwise we need to give a logger to the Usage.

Panicking seems like a bad option here IMO.

Perhaps we should try to use it from some of the extensions, so we can try to early identify potential breaking changes and/or unsatisfied needs from the API.

If anyone has a particular extension wiht usage we want to track that might be a great idea.

Aug 29 '24 10:08 mstoykov

Here's my take:

Currently I identify 3 API questions that can't be fixed with "we will add it later":

String and Strings

...

I personally feel like 1 seems like the best option - it has the least amount of trouble and removes 1 of the methods here while allowing us to return it if we want. Three of the fields are actually constants we have access to either way and hte last one is from the executionState which might not be the worst thing to keep around.

I agree, in fact, we could always introduce something to store single strings later, right? Cause, internally the map will remain with values typed as any, and from the consumer point of view is more or less the same, I guess.

Count currently takes int64

...

Also agree 👍🏻 Moreover, I'm also wondering if it would make sense to have a similar method but just to Set (instead of "counting" or "adding"). Particularly for values that doesn't change over the executed, or that are reported at the end, I guess it's more semantically correct.

The map that is returned currently supports only 1 level of grouping with /

...

No strong opinions here. Ideally I'd prefer to avoid breaking changes on the resulting report, of course, but I guess that in the worse case we can have some values to be added to the report (map) without explicitly being exposed through the API 🤔

Error/log something else on conflicts

...

I have contradictory feelings here, because partly I feel like the errors that may happen here should theoretically never happen and if so be captured as soon as possible, and so I'd suggest to panic. But it's also true, and not less important, that if this becomes a public API, there could be conflicts organically, and those could even be only discoverable at runtime (depending on the extensions are enabled, for instance).

So, I guess I'd probably suggest to return errors from the API, and perhaps panicking on our code when calling those, if any error.

Perhaps we should try to use it from some of the extensions, so we can try to early identify potential breaking changes and/or unsatisfied needs from the API.

If anyone has a particular extension wiht usage we want to track that might be a great idea.

I'd be happy to add it to https://github.com/joanlopez/xk6-aws, for instance to analyze which services are being used more.

However, I feel like nobody is using that extension yet (didn't exposed much except for listing it in the docs site, and it's "competing" with our official jslib), so not very useful. But, perhaps we could ask browser folks?

Finally, one extra thought, partly related to the errors topic, is: should we reserve some keys? So, for instance nobody can re-write the `k6_version` attribute? For instance, by forbidding it at the API level, and setting it "manually" in k6, at report creation time.

Aug 29 '24 13:08 joanlopez

Threading usage through k6 turned out to not be so easy ... due to tests.

I am not certain I like the particular way it ended up being, but arguably I prefer this than having to export module resolver for a runner to the cmd package ... :grimacing:

The above can be removed in a later PR, I really don't think I should keep adding changes to this one, I might even be for taking back https://github.com/grafana/k6/pull/3917/commits/0b49b3f00d22bd65950a28a24b19450247248d12, although it is probably a pretty important we do that.

I will try to make a refactoring PR on actually not needing to do all the changes the next time we add a field like this. The predominant reasons was rerunning tests with an archive in js package is repeated everywhere ...

Sep 05 '24 14:09 mstoykov

k6 k6 copied to clipboard

Implement support to collect Usage dynamically

What?

Why?

Checklist

Related PR(s)/Issue(s)

String and Strings

Count currently takes int64

The map that is returned currently supports only 1 level of grouping with /

Error/log something else on coflicts

String and Strings

Count currently takes int64

The map that is returned currently supports only 1 level of grouping with /

Error/log something else on conflicts

k6
k6 copied to clipboard

The map that is returned currently supports only 1 level of grouping with `/`

The map that is returned currently supports only 1 level of grouping with `/`