fishtest icon indicating copy to clipboard operation
fishtest copied to clipboard

--report-hostname

Open vondele opened this issue 3 years ago • 31 comments

There is the request to show hostnames for fishtest workers. I think this is reasonable, and useful for debugging. Users with multiple machines might find it difficult to check which machine corresponds to a given UUID

Maybe the reason not to do this is some concern on privacy? One approach would be to make this an opt-in feature with a cmdline switch, alternatively the UUID could link to a page which describes this information making this visible only to the user and the approvers.

vondele avatar Jan 27 '22 16:01 vondele

I need some time to study the changes of last couple of months (worker_info, schema check, runs collection etc), at the moment I don't know how to implement the feature in the right way.

ppigazzini avatar Jan 27 '22 16:01 ppigazzini

yeah, socket.gethostname().split(".")[0] is the easy bit, the database and GUI work is more advanced.

vondele avatar Jan 27 '22 17:01 vondele

This should be a good example https://github.com/glinscott/fishtest/pull/1256 but I need to understand the schema check in fishtest/server/util.py

ppigazzini avatar Jan 27 '22 17:01 ppigazzini

Sending the hostname is trivial but...

On the worker side: the hostname should be added as a field to worker_info. I propose to have --set_hostname <string> for people that want to report something else than the true hostname (for privacy reasons).

On the server side: in api.py a field 'hostname' : str should be added to the schema.

The real question is what to do with this information.

I have a proposal: since people will look for problems in the event log, we can change the username field to <username>/<hostname> if <username> is logged in or if the logged in user is an admin.

Of course some people might object to the need to log in ... then we need a worker option --hide_hostname <true/false> (default: true)

EDIT: --set_hostname is probably a bad idea, given that people copy config files between machines. Unless it is not saved in the config file... But this goes against the established practice for the worker.

vdbergh avatar Jan 28 '22 07:01 vdbergh

I think we need to have a good display of the worker. Right now we have on the event list an the test/task list ChessDBCN-16cores-e6b3c7d0. We could indeed show this as username@hostname-cores-UUI if is logged in or if the logged in user is an admin (approver?). I think having the requirement of being logged in is fair.

I think making this opt-in (--show_hostname option to the worker) is in that case sufficient.

vondele avatar Jan 28 '22 08:01 vondele

IMO we shouldn't send a hostname. The field should be a totally optional user provided identifier that is shown next ~~or replace~~ the usual identifier in the GUI.

ppigazzini avatar Jan 28 '22 08:01 ppigazzini

There are three options.

  1. Only send the hostname if the user wants it and show it only if the user is logged in/admin (the most restrictive option).

  2. Only send the hostname if the user wants it and make it visible for everyone (since it is opt in).

  3. Always send the hostname and show it only if the user is logged in/admin.

My preference would be 3.

Anyway it seems we definitely need an enhanced worker_name() in util.py.

vdbergh avatar Jan 28 '22 08:01 vdbergh

The hostname can be meaningless: workers running in different directories from the same host, workers started in colab, workers started in containers ecc.

The user could not control the hostname: workers running from or uni or corp.

ppigazzini avatar Jan 28 '22 08:01 ppigazzini

@ppigazzini The fact that the hostname is sometimes meaningless seems harmless. https://en.wikipedia.org/wiki/Nirvana_fallacy

For me the only real issue is the privacy aspect. Do we trust the fishtest admins enough (now and in the future) to allow recording hostnames in the database, without the user explicitly opting in? Note that there is a precedent for the remote ip. But one can argue that this is anyway known to the server because of the ip protocol.

vdbergh avatar Jan 28 '22 09:01 vdbergh

hostname is wrong from a logical pov, we need and want a workername. We do have workers fleets running from unis and corps, so the workername should be user provided.

ppigazzini avatar Jan 28 '22 09:01 ppigazzini

Well reporting hostname was explicitly requested by a user (see the OP)...

I think a user provided worker name does not work since users copy config files between machines (I know I do).

vdbergh avatar Jan 28 '22 09:01 vdbergh

About privacy in EU the data collected should be compliant to the GDPR, eg we should delete the sensible data upon the request of the user.

That user simply wants to be able to identifier a bad worker running or from a dozens of X86 servers of from 1000 Raspberry Pi.

and next year I plan to build a new server farm. (either 10x powerful servers -or- 1000x Raspberries Pi... undecided yet)

ppigazzini avatar Jan 28 '22 09:01 ppigazzini

About privacy in EU the data collected should be compliant to the GDPR, eg we should delete the sensible data upon the request of the user.

That user simply wants to be able to identifier a bad worker running or from a dozens of X86 servers of from 1000 Raspberry Pi.

and next year I plan to build a new server farm. (either 10x powerful servers -or- 1000x Raspberries Pi... undecided yet)

I hadn't thought about the GDPR. But I guess that would apply with and without opt in from the user... So it is not related to the current discussion.

But thinking again I would now maybe prefer option 1. I.e. opt in by the user so that we are reasonably sure that the computer has a sensible hostname, and shown on the server only if the user is logged in or is an admin.

vdbergh avatar Jan 28 '22 10:01 vdbergh

The opt-in could be --worker_name="foobar", and the user could decide to use --worker_name=`hostname` if so desired (but any other string that is convenient for him like).

Option 1 seems good to me (but maybe less needed if this is opt-in and a string specified by the user).

vondele avatar Jan 28 '22 11:01 vondele

We should limit the string length to avoid --worker_name=`fortune | cowsay` UTF8 can be a problem?

ppigazzini avatar Jan 28 '22 11:01 ppigazzini

yes, agree with the length. We could probably follow what we can expect for hostname

Each element of the hostname must be from 1 to 63 characters long and the entire hostname, including the dots, can be at most 253 characters long. Valid characters for hostnames are ASCII(7) letters from a to z, the digits from 0 to 9, and the hyphen (-)

so 63 characters and ascii would be a reasonable restriction.

vondele avatar Jan 28 '22 11:01 vondele

I still think that --worker_name="foobar" is not a good idea. A typical user will maybe enter it once, then copy the config file to the other servers under their control and forget to clear the --worker_name. Then all instances will be uselessly reported as @ foobar.

I think to avoid db polution it is much better to restrict to hostname reporting (when the user opts in).

EDIT: Unless the argument of --worker_name is not saved to the config file.

vdbergh avatar Jan 28 '22 12:01 vdbergh

Dropped the ping :)

ppigazzini avatar Jan 28 '22 12:01 ppigazzini

I missed this:

I have a proposal: since people will look for problems in the event log, we can change the username field to <username>/<hostname> if <username> is logged in or if the logged in user is an admin.

I think that the worker_name will be used:

  • to debug a worker problem in Events log
  • to check in the home page (or elsewhere) if all the allocated workers are running or running with the right config (eg concurrency)

ppigazzini avatar Jan 28 '22 12:01 ppigazzini

@ppigazzini Yes but that is only true if the argument of --worker_name is something sensible. Which I doubt if it is left to the users (e.g. the argument about copying config files I have been making).

vdbergh avatar Jan 28 '22 12:01 vdbergh

IMO that features is useful for an user that runs a fleet of workers, a normal user shouldn't use it at all. I don't know if should be easy to add the filtered out workers in the user page like we are doing for the tests.

ppigazzini avatar Jan 28 '22 12:01 ppigazzini

But IMHO a user that runs a fleet of workers would just as well be served by --report_hostname I think (which was the original request).

vdbergh avatar Jan 28 '22 12:01 vdbergh

I don't know if should be easy to add the filtered out workers in the user page like we are doing for the tests.

I am sorry. I do not understand the question.

As I see it we should equip worker_name() in util.py with an optional argument, use_hostname (default False) to be able to create a workername of the form [email protected].

The optional argument use_hostname should be set when the user is logged in or when the logged in user is an admin.

vdbergh avatar Jan 28 '22 13:01 vdbergh

Ok for the event_log it is not so easy since the worker_name is created when the event happens, not when the log is watched.

Fixable, but needs some thought.

vdbergh avatar Jan 28 '22 13:01 vdbergh

A technical question. How do we know which user is logged in?

vdbergh avatar Jan 28 '22 13:01 vdbergh

Ok that's easy. request.authenticated_userid is a boolean that says if the user is logged in.

EDIT. Not a boolean. Just the name of the logged in user, if any.

vdbergh avatar Jan 28 '22 13:01 vdbergh

Mockup

image

ppigazzini avatar Jan 28 '22 13:01 ppigazzini

I thought about this. But the problem is that the machines display is not permanent. So by itself it is not sufficient to connect a uuid with a hostname.

Reporting the workername as [email protected] on the other hand solves this problem.

The two things are not mutually exclusive.

vdbergh avatar Jan 28 '22 13:01 vdbergh

Now I understand the ping to foobar user :)

ppigazzini avatar Jan 28 '22 13:01 ppigazzini

  • Only send the hostname if the user wants it and make it visible for everyone (since it is opt in).

I prefer this option. (opt-in via cmd) to send hostname, real or custom (fake), and make it visible for everyone. This is easier to implement.

Technologov avatar Feb 21 '22 03:02 Technologov