aim icon indicating copy to clipboard operation
aim copied to clipboard

`aim up` is not user-friendly when training on remote machine

Open patrickvonplaten opened this issue 2 years ago β€’ 6 comments

πŸš€ Feature

Hey aimhubio team!

First of all - thanks for the super nice open-source ML tracking library! It's great that every component of it is open-sourced.

I opened this issue to ask for a more user-friendly way of showing results UI when training on remote machine.

Motivation

Most ML training is done on remote machines which means that the quickstart tutorial breaks once one does

aim up

because the displayed website is not available. To make it available once has to set up a remote tunnel which is not something most ML engineers know how to do.

Pitch

Make it easier to display runs or add better docs. Let's say I spin up a GPU on GCP and then log in with:

ssh <name>@<address>

Now I run a script that saves a .aim folder and then I wish to analyse the results with aim up. Now instead of seeing a website one gets "This site can’t be reached" without any hint to what's going on.

To solve the problem once needs to set up a reverse ssh connection:

ssh -L 16007:127.0.0.1:16007 <name>@<address>

and then use aim up as follows:

aim up --host 127.0.0.1 --port 16007

This is hard to find out and might churn people away from using the UI. It's great that one doesn't need to login to see the UI (that's a major reason why I like to use aimstack instead of WandB), but the UX could be a bit nicer.

Better UI:

Instead of just showing:

β”Œ------------------------------------------------------------------------┐
                Aim UI collects anonymous usage analytics.
                        Read how to opt-out here:
    https://aimstack.readthedocs.io/en/latest/community/telemetry.html
β””------------------------------------------------------------------------β”˜
Running Aim UI on repo `<Repo#1375512297415772289 path=/home/patrick_huggingface_co/evaluate_whisper/.aim read_only=None>`
Open http://127.0.0.1:16007
Press Ctrl+C to exit

there could be at least a couple of lines what to do to be able to display the runs via a remote

Related issues:

  • https://github.com/aimhubio/aim/issues/2083
  • https://github.com/aimhubio/aim/issues/213
  • https://github.com/aimhubio/aim/issues/253

Also cc @julien-c

patrickvonplaten avatar Oct 09 '22 19:10 patrickvonplaten

Hey @patrickvonplaten, thanks for the kind words! πŸ™Œ

We are currently focused on improving e2e experience and definitely this is a critical point. Thanks for sharing it in such detail. We will prioritize and resolve it asap. I will share the progress here.

gorarakelyan avatar Oct 09 '22 20:10 gorarakelyan

Depending on the remote, you may be able to set --host=0.0.0.0 (i.e. "bind all incoming connections").


On the remote server:

aim up --host=0.0.0.0 --port=16007

Then, from your client browser navigate to either:

http://remote_address:16007/
http://username@remote_address:16007/

YodaEmbedding avatar Oct 11 '22 09:10 YodaEmbedding

Cool - I'll try this :-)

patrickvonplaten avatar Oct 11 '22 18:10 patrickvonplaten

@YodaEmbedding that is much easier to setup. However, I think there might be security concerns, as the port will be open and anyone will be able to access and see training insights/results. Of course unless it is run inside a vpn or smth like that. Thoughts?

gorarakelyan avatar Oct 26 '22 12:10 gorarakelyan

In my case, the host machine is on a local network within our organization. This is good because I can share a link to other members on my team. But this means other teams can also access it, which may not be desirable in certain organizations.

One way around this problem might be (optional) browser-cached user authentication and login page, similar to ClearML.

YodaEmbedding avatar Oct 30 '22 22:10 YodaEmbedding

For whoever stumbled across this issue, you could also use VS code / ssh and do port forwarding on the server.

image image

vwxyzjn avatar Oct 24 '23 21:10 vwxyzjn