netlab icon indicating copy to clipboard operation
netlab copied to clipboard

Collecting netlab usage data

Open ipspace opened this issue 1 year ago • 10 comments

It would be great to know how people use netlab; currently, we can only guess as we get little feedback and zero hard data.

The proposal to implement the usage data collection and eventual upload is in docs/roadmaps/usage.md. Feedback or PRs against that file are most welcome.

ipspace avatar Nov 03 '24 10:11 ipspace

Some very draft ideas:

Receiving and storing data can be achieved using cloudflare workers+kv or D1 storage, or with AWS Lambda+DynamoDB (plus putting some limits on it).

To demonstrate that we do not collect sensible data we could also show the collected data and some reporting?

(Edit: if we need more resources for collecting and storing data we could apply for this? https://blog.cloudflare.com/expanding-our-support-for-oss-projects-with-project-alexandria )

ssasso avatar Nov 03 '24 10:11 ssasso

Ivan's proposed collection mechanism is in plain-text yml dictionary , so any user can actually see the data collected, and the upload is user triggered, so I guess this covers the issue.

I personally would be interested with what host OSes Netlab is used as well.

To demonstrate that we do not collect sensible data we could also show the collected data and some reporting?

DanPartelly avatar Nov 03 '24 10:11 DanPartelly

I personally would be interested with what host OSes Netlab is used as well.

Agree. Would you please document how we could collect that in a way that would work on most Linux distros while still providing reasonably easy-to-interpret results?

For example, uname -a produces a printout that someone might be able to deduce Ubuntu release from, but it's way beyond my capabilities. Anyway, according to this https://gist.github.com/natefoo/814c5bf936922dad97ff, the whole thing is a bit of a mess

ipspace avatar Nov 03 '24 11:11 ipspace

Receiving and storing data can be achieved using cloudflare workers+kv or D1 storage, or with AWS Lambda+DynamoDB (plus putting some limits on it).

These days I would definitely go with CF workers + KV/D1/R2

To demonstrate that we do not collect sensible data we could also show the collected data and some reporting?

"The user could inspect the usage data with netlab usage show" ;) https://github.com/ipspace/netlab/blob/dev/docs/roadmap/usage.md?plain=1#L19

ipspace avatar Nov 03 '24 11:11 ipspace

Ok for the inspection of collected data, but seeing some "reporting stats" could be interesting imho

ssasso avatar Nov 03 '24 12:11 ssasso

Ok for the inspection of collected data, but seeing some "reporting stats" could be interesting imho

That's why I was thinking a GitHub repo might be a nice option - it puts the (anonymized) reported stats in a public place that people can go look at if they want to - not hidden in some backend database

jbemmel avatar Nov 03 '24 12:11 jbemmel

I like this, conceptually. There is nothing like letting the user watch the data.

That's why I was thinking a GitHub repo might be a nice option - it puts the (anonymized) reported stats in a public place that people can go look at if they want to - not hidden in some backend database

DanPartelly avatar Nov 03 '24 15:11 DanPartelly

Sure, Ill look into it, and yes, you are right, this can be a can of worms. I had to fight it recently with cmake , their linux detection sucks so I had to overwrite the variables.

Agree. Would you please document how we could collect that in a way that would work on most Linux distros while still providing reasonably easy-to-interpret results?

DanPartelly avatar Nov 03 '24 15:11 DanPartelly

I like this, conceptually. There is nothing like letting the user watch the data.

That's why I was thinking a GitHub repo might be a nice option - it puts the (anonymized) reported stats in a public place that people can go look at if they want to - not hidden in some backend database

Maybe we could even talk to GitHub and make this into an officially supported feature. Usage data for open source projects voluntarily provided by GitHub users would be a great addition - I think many projects would use that

jbemmel avatar Nov 03 '24 15:11 jbemmel

Perhaps the best way to determine the OS name without descending into madness is to use a systemd component, hostnamectl. It will return the correct distro name in its output. It will of course only work on systems using systemd but in 2024 all mainstream distros use it. Where it will fail are musl lib C based distros, which still use alternate init systems by necessity (Alpine, Void Linux, Chimera) and specialty distributions (embeded ... whatever).

Agree. Would you please document how we could collect that in a way that would work on most Linux distros while still providing reasonably easy-to-interpret results?

DanPartelly avatar Nov 04 '24 15:11 DanPartelly

CLI implemented in #2202

ipspace avatar Apr 28 '25 15:04 ipspace