atlas Multi-tenant Tensorboard server

Multi-tenant Tensorboard server

Open mohammedri opened this issue 4 years ago • 4 comments

Currently if you go into a project and click on send to Tesorboard, it will create a server where it will run Tensorboard for that specific job. However this is not compatible with a multi-user and multi-tenant Atlas hosted on a cluster. Since there is only one instance of the Tensorboard Service, all users will clash.

Mar 13 '20 23:03 mohammedri

#123 should be completed first as this will inherently rely on how many users that there are.

Apr 28 '20 20:04 amackillop

My initial thoughts on accomplishing this:

We should only need to scale the tb server container with the number of users.
The tensorboard api should just forward the request to the correct tb_server container (based on user) instead of creating the links.
The logic for actually creating the sym links should live within an api running in the same container as the server.

Alternative:

Merge the two containers so that the server and api are both running in the same container
Scale this merged container with users.
The rest api (send_to_tensorboard endpoint) can decide which container to forward to based on the logged in user.

Apr 28 '20 20:04 amackillop

@ekhl See above, I just got that in ahead of your question lol. I can look into multi tenancy in the underlying tb_server itself.

Apr 28 '20 20:04 amackillop

For reference, an old issue that planned to productionize Tensorboard, mentioning multi-tenancy: https://github.com/tensorflow/tensorboard/issues/92. Unfortunately the issue was closed because the planned features were "too ambitious and potentially overlap with the work other folks are doing"

May 02 '20 03:05 ekhl

atlas atlas copied to clipboard

Multi-tenant Tensorboard server

atlas
atlas copied to clipboard