atlas icon indicating copy to clipboard operation
atlas copied to clipboard

Multi-tenant Tensorboard server

Open mohammedri opened this issue 4 years ago • 4 comments

Currently if you go into a project and click on send to Tesorboard, it will create a server where it will run Tensorboard for that specific job. However this is not compatible with a multi-user and multi-tenant Atlas hosted on a cluster. Since there is only one instance of the Tensorboard Service, all users will clash.

mohammedri avatar Mar 13 '20 23:03 mohammedri

#123 should be completed first as this will inherently rely on how many users that there are.

amackillop avatar Apr 28 '20 20:04 amackillop

My initial thoughts on accomplishing this:

  • We should only need to scale the tb server container with the number of users.
  • The tensorboard api should just forward the request to the correct tb_server container (based on user) instead of creating the links.
  • The logic for actually creating the sym links should live within an api running in the same container as the server.

Alternative:

  • Merge the two containers so that the server and api are both running in the same container
  • Scale this merged container with users.
  • The rest api (send_to_tensorboard endpoint) can decide which container to forward to based on the logged in user.

amackillop avatar Apr 28 '20 20:04 amackillop

@ekhl See above, I just got that in ahead of your question lol. I can look into multi tenancy in the underlying tb_server itself.

amackillop avatar Apr 28 '20 20:04 amackillop

For reference, an old issue that planned to productionize Tensorboard, mentioning multi-tenancy: https://github.com/tensorflow/tensorboard/issues/92. Unfortunately the issue was closed because the planned features were "too ambitious and potentially overlap with the work other folks are doing"

ekhl avatar May 02 '20 03:05 ekhl