fishtest
fishtest copied to clipboard
--report-hostname
There is the request to show hostnames for fishtest workers. I think this is reasonable, and useful for debugging. Users with multiple machines might find it difficult to check which machine corresponds to a given UUID
Maybe the reason not to do this is some concern on privacy? One approach would be to make this an opt-in feature with a cmdline switch, alternatively the UUID could link to a page which describes this information making this visible only to the user and the approvers.
I need some time to study the changes of last couple of months (worker_info, schema check, runs collection etc), at the moment I don't know how to implement the feature in the right way.
yeah, socket.gethostname().split(".")[0]
is the easy bit, the database and GUI work is more advanced.
This should be a good example https://github.com/glinscott/fishtest/pull/1256
but I need to understand the schema check in fishtest/server/util.py
Sending the hostname is trivial but...
On the worker side: the hostname should be added as a field to worker_info
. I propose to have --set_hostname <string>
for people that want to report something else than the true hostname (for privacy reasons).
On the server side: in api.py
a field 'hostname' : str
should be added to the schema.
The real question is what to do with this information.
I have a proposal: since people will look for problems in the event log, we can change the username field to <username>/<hostname>
if <username>
is logged in or if the logged in user is an admin.
Of course some people might object to the need to log in ... then we need a worker option --hide_hostname <true/false>
(default: true
)
EDIT: --set_hostname
is probably a bad idea, given that people copy config files between machines. Unless it is not saved in the config file... But this goes against the established practice for the worker.
I think we need to have a good display of the worker. Right now we have on the event list an the test/task list ChessDBCN-16cores-e6b3c7d0. We could indeed show this as username@hostname-cores-UUI if
I think making this opt-in (--show_hostname option to the worker) is in that case sufficient.
IMO we shouldn't send a hostname. The field should be a totally optional user provided identifier that is shown next ~~or replace~~ the usual identifier in the GUI.
There are three options.
-
Only send the hostname if the user wants it and show it only if the user is logged in/admin (the most restrictive option).
-
Only send the hostname if the user wants it and make it visible for everyone (since it is opt in).
-
Always send the hostname and show it only if the user is logged in/admin.
My preference would be 3.
Anyway it seems we definitely need an enhanced worker_name()
in util.py.
The hostname can be meaningless: workers running in different directories from the same host, workers started in colab, workers started in containers ecc.
The user could not control the hostname: workers running from or uni or corp.
@ppigazzini The fact that the hostname is sometimes meaningless seems harmless. https://en.wikipedia.org/wiki/Nirvana_fallacy
For me the only real issue is the privacy aspect. Do we trust the fishtest admins enough (now and in the future) to allow recording hostnames in the database, without the user explicitly opting in? Note that there is a precedent for the remote ip. But one can argue that this is anyway known to the server because of the ip protocol.
hostname
is wrong from a logical pov, we need and want a workername
.
We do have workers fleets running from unis and corps, so the workername
should be user provided.
Well reporting hostname was explicitly requested by a user (see the OP)...
I think a user provided worker name does not work since users copy config files between machines (I know I do).
About privacy in EU the data collected should be compliant to the GDPR, eg we should delete the sensible data upon the request of the user.
That user simply wants to be able to identifier a bad worker running or from a dozens of X86 servers of from 1000 Raspberry Pi.
and next year I plan to build a new server farm. (either 10x powerful servers -or- 1000x Raspberries Pi... undecided yet)
About privacy in EU the data collected should be compliant to the GDPR, eg we should delete the sensible data upon the request of the user.
That user simply wants to be able to identifier a bad worker running or from a dozens of X86 servers of from 1000 Raspberry Pi.
and next year I plan to build a new server farm. (either 10x powerful servers -or- 1000x Raspberries Pi... undecided yet)
I hadn't thought about the GDPR. But I guess that would apply with and without opt in from the user... So it is not related to the current discussion.
But thinking again I would now maybe prefer option 1. I.e. opt in by the user so that we are reasonably sure that the computer has a sensible hostname, and shown on the server only if the user is logged in or is an admin.
The opt-in could be --worker_name="foobar"
, and the user could decide to use --worker_name=`hostname`
if so desired (but any other string that is convenient for him like).
Option 1 seems good to me (but maybe less needed if this is opt-in and a string specified by the user).
We should limit the string length to avoid --worker_name=`fortune | cowsay`
UTF8 can be a problem?
yes, agree with the length. We could probably follow what we can expect for hostname
Each element of the hostname must be from 1 to 63 characters long and the entire hostname, including the dots, can be at most 253 characters long. Valid characters for hostnames are ASCII(7) letters from a to z, the digits from 0 to 9, and the hyphen (-)
so 63 characters and ascii would be a reasonable restriction.
I still think that --worker_name="foobar"
is not a good idea. A typical user will maybe enter it once, then copy the config file to the other servers under their control and forget to clear the --worker_name
. Then all instances will be uselessly reported as @ foobar.
I think to avoid db polution it is much better to restrict to hostname reporting (when the user opts in).
EDIT: Unless the argument of --worker_name
is not saved to the config file.
Dropped the ping :)
I missed this:
I have a proposal: since people will look for problems in the event log, we can change the username field to
<username>/<hostname>
if<username>
is logged in or if the logged in user is an admin.
I think that the worker_name
will be used:
- to debug a worker problem in Events log
- to check in the home page (or elsewhere) if all the allocated workers are running or running with the right config (eg concurrency)
@ppigazzini Yes but that is only true if the argument of --worker_name
is something sensible. Which I doubt if it is left to the users (e.g. the argument about copying config files I have been making).
IMO that features is useful for an user that runs a fleet of workers, a normal user shouldn't use it at all. I don't know if should be easy to add the filtered out workers in the user page like we are doing for the tests.
But IMHO a user that runs a fleet of workers would just as well be served by --report_hostname
I think (which was the original request).
I don't know if should be easy to add the filtered out workers in the user page like we are doing for the tests.
I am sorry. I do not understand the question.
As I see it we should equip worker_name()
in util.py
with an optional argument, use_hostname
(default False
) to be able to create a workername of the form [email protected]
.
The optional argument use_hostname
should be set when the user is logged in or when the logged in user is an admin.
Ok for the event_log
it is not so easy since the worker_name
is created when the event happens, not when the log is watched.
Fixable, but needs some thought.
A technical question. How do we know which user is logged in?
Ok that's easy. request.authenticated_userid
is a boolean that says if the user is logged in.
EDIT. Not a boolean. Just the name of the logged in user, if any.
Mockup
I thought about this. But the problem is that the machines display is not permanent. So by itself it is not sufficient to connect a uuid
with a hostname
.
Reporting the workername as [email protected] on the other hand solves this problem.
The two things are not mutually exclusive.
Now I understand the ping to foobar user :)
- Only send the hostname if the user wants it and make it visible for everyone (since it is opt in).
I prefer this option. (opt-in via cmd) to send hostname, real or custom (fake), and make it visible for everyone. This is easier to implement.