atlas
atlas copied to clipboard
Multi-tenant Tensorboard server
Currently if you go into a project and click on send to Tesorboard, it will create a server where it will run Tensorboard for that specific job. However this is not compatible with a multi-user and multi-tenant Atlas hosted on a cluster. Since there is only one instance of the Tensorboard Service, all users will clash.
#123 should be completed first as this will inherently rely on how many users that there are.
My initial thoughts on accomplishing this:
- We should only need to scale the tb server container with the number of users.
- The tensorboard api should just forward the request to the correct tb_server container (based on user) instead of creating the links.
- The logic for actually creating the sym links should live within an api running in the same container as the server.
Alternative:
- Merge the two containers so that the server and api are both running in the same container
- Scale this merged container with users.
- The rest api (send_to_tensorboard endpoint) can decide which container to forward to based on the logged in user.
@ekhl See above, I just got that in ahead of your question lol. I can look into multi tenancy in the underlying tb_server itself.
For reference, an old issue that planned to productionize Tensorboard, mentioning multi-tenancy: https://github.com/tensorflow/tensorboard/issues/92. Unfortunately the issue was closed because the planned features were "too ambitious and potentially overlap with the work other folks are doing"