vscode-dvc icon indicating copy to clipboard operation
vscode-dvc copied to clipboard

Show experiments queue info

Open dberenbaum opened this issue 2 years ago • 43 comments

dvc queue status shows info about the experiments queue that wasn't available before:

$ dvc queue status
Task     Name    Created    Status
065b468          02:07 PM   Running
e91aa40          02:07 PM   Queued
7a04381          12:36 PM   Failed
dcdc661          02:08 PM   Success

Worker status: 1 active, 0 idle

VS Code should be able to:

  • Show the queue separately from completed experiments - it can be helpful to think of the queued tasks separately from completed experiments (see https://github.com/iterative/dvc.org/issues/3658#issuecomment-1168608090)
  • Show more granular info about queued tasks - status, creation time, logs (see dvc queue logs)
  • Show queue status - how many workers are running/idle

dberenbaum avatar Jul 06 '22 18:07 dberenbaum

My initial thought was to show the contents of dvc queue status in a simple table. We could put the worker(s) info above the header with badges like we do in the current experiments table:

image

However, after considering further I think that this is a better candidate for a simple tree view. We could move the existing queued items out from the experiments tree into a new queue (status) tree. I.e these records:

image

The tree can have various options associated with each item. E.g users will have the option to remove records from the tree. We can also hide records by status.

From working with the feature a bit I think a good addition for VS Code would be to the ability to open the logs for a specific run by clicking on the row. We could pipe the output into one of our fake terminals:

Screen Shot 2022-07-13 at 4 32 18 pm (2)

The user could even follow the logs for multiple experiments if they chose to.

If we are looking to split out queued experiments from the current experiments table webview then we will need to have a longer conversation as we need to talk about how we will filter, sort, etc and how we could rework the current UI.

mattseddon avatar Jul 13 '22 06:07 mattseddon

That all makes sense to me @mattseddon. Interested to hear what @karajan1001 thinks.

dberenbaum avatar Jul 13 '22 20:07 dberenbaum

Folks, for that amount of information that we have now in the description I don't feel the need for a separate tree and/or webview. At least let's not make it a priority. I still think we should be fine for now with having queued experiments (why is it called a task, btw?) in the main table. (running, failed, etc - we have to show them in the table).

shcheklein avatar Jul 13 '22 22:07 shcheklein

@shcheklein re why it is a task: https://github.com/iterative/dvc.org/pull/3715#discussion_r918387280

I do not think we will be removing items from the experiments table. A new tree is not expensive to implement and it will contain different information about the new queue (output of dvc queue status). It should give us a convenient way to manage queue tasks without having to wedge anything else into the experiments table.

The alternative would be to put all queued, succeeded, running, failed queue tasks into the experiment table and give the user the option to manage tasks/filter these out but I think the idea on the DVC side is to further expose the mechanics of the queue and separate the two concepts (queue tasks vs experiments). If we included succeeded tasks we would have two records for an experiment created from the queue, one successful queue task (which contains artefacts like logs) and the experiments record (what we currently show).

@dberenbaum please correct the above if I got any of it wrong.

mattseddon avatar Jul 14 '22 00:07 mattseddon

@mattseddon thanks for the link :) I put a comment there. My 2cs - I don't like the idea of dealing with two terms "task" and "experiments". We should have one user-facing. Second one ("task") if we keep it should be something advanced, something auxiliary.

The alternative would be to put all queued, succeeded, running, failed queue tasks into the experiment table and give the user the option to manage tasks/filter these ...

Yes. But not "task", "experiments". Yes, I can queue an experiment, it can fail or I can delete after I queued it, etc. I want to parameters and all inputs, etc. Again, it's very hard for me for some reason to justify an existence of any other terms on the same level as experiments.

If we included succeeded tasks we would have two records for an experiment created from the queue, one successful queue task (which contains artefacts like logs) and the experiments record (what we currently show).

yep. we should not have this behavior, of course. It fine if tasks are used as an internal building block. In the table they should be mapped to my queued, running, failed, done experiments.

shcheklein avatar Jul 14 '22 01:07 shcheklein

My only follow up question would be how do we display that the queue is currently being processed and the number of workers? I think a status bar item would work well.

mattseddon avatar Jul 14 '22 01:07 mattseddon

@mattseddon q - can we see it from the table itself, like I see that 4 items are spinning for example in the table? would it be enough to start?

shcheklein avatar Jul 14 '22 01:07 shcheklein

I just checked and right now the queue can be processing tasks but in the extension UI it will appear as if nothing is happening/running. Even the status bar spinner is stopped. We need some extra information (most likely from dvc queue status) to bridge the gap.

dvc queue start -j 1:

https://user-images.githubusercontent.com/37993418/178883260-928dbe2b-c5f2-42cc-9827-46e1421243b5.mov

mattseddon avatar Jul 14 '22 02:07 mattseddon

From my side, the task and experiments are almost one to one correspondence, we can hide the task concept and only expose experiments to the UI. The only exception here is to the checkpoint, I think it's better to bunch checkpoint experiment result together as a single experiment.

OK for me to use the tree to show the queue tasks, but where should a tree to provide CreateTime or some other property?

karajan1001 avatar Jul 14 '22 10:07 karajan1001

Sounds good. It makes sense to combine as much as possible in VS Code (not sure it's so easy in CLI), but agree with @karajan1001 there are some corner cases we might need to consider.

dberenbaum avatar Jul 14 '22 18:07 dberenbaum

From my side, the task and experiments are almost one to one correspondence, we can hide the task concept and only expose experiments to the UI. The only exception here is to the checkpoint, I think it's better to bunch checkpoint experiment result together as a single experiment.

Sounds good. It makes sense to combine as much as possible in VS Code (not sure it's so easy in CLI), but agree with @karajan1001 there are some corner cases we might need to consider.

👍🏻

There are a few different discussions ongoing outside of this ticket that relate to this topic:

  1. https://github.com/iterative/dvc/issues/7986
  2. https://github.com/iterative/dvc/issues/8014
  3. #1996

Current summary (as I see it):

  • The experiments table should be the central location to manage experiments.
  • We don't want to introduce a new task queue concept to users.
  • We would like to avoid implementing a new view.
  • We would like to avoid calling in the extension dvc queue status (if possible).

Questions that need answers:

  1. Whether or not to give users access to queue task logs from the UI.
  2. Whether or not to show the created field for queue tasks.
  3. Whether or not to show the number of task workers and their status (idle or working). See https://github.com/iterative/vscode-dvc/issues/1995#issuecomment-1183907303 for why this is important.

Personally, I see value in all 3. We can solve 1 by adding the queue task's sha to the exp show data. We could do the same thing for 2. This probably makes the most sense for now as we want to hide the task concept altogether. What we do about the worker situation is less obvious. I cannot even proxy the information by using something like the executor field in the exp show data because it will not fill in the gaps in the UI.

What do you guys think? Also, did I miss anything?

mattseddon avatar Jul 15 '22 06:07 mattseddon

Thanks for the great summary, @mattseddon !

Whether or not to give users access to queue task logs from the UI.

For them it would be "see experiment logs"? Yes, it's a very reasonable thing to do.

Whether or not to show the created field for queue tasks.

like should we keep the timestamp field empty in the table for those or not?

Whether or not to show the number of task workers and their status

Not sure this is important. What would be a use case of this?

shcheklein avatar Jul 16 '22 00:07 shcheklein

like should we keep the timestamp field empty in the table for those or not?

I assumed that the created date for queue tasks would differ from the one shown in the table.

Not sure this is important. What would be a use case of this?

There is currently no indication in the UI that the queue is being processed. This becomes a problem when it seems like all experiments processing activity has ceased (https://github.com/iterative/vscode-dvc/issues/1995#issuecomment-1183907303).

mattseddon avatar Jul 16 '22 02:07 mattseddon

There is currently no indication in the UI that the queue is being processed.

But we do show that experiment is running, is it enough? Or we need something else?

I assumed that the created date for queue tasks would differ from the one shown in the table.

If it's different (and specific to the queue) - I think we don't care tbh.

shcheklein avatar Jul 16 '22 02:07 shcheklein

But we do show that experiment is running, is it enough? Or we need something else?

There are unnaturally long gaps when it appears that nothing is happening/running. We need something else.

mattseddon avatar Jul 16 '22 03:07 mattseddon

@mattseddon

There are unnaturally long gaps when it appears that nothing is happening/running. We need something else.

could you give more details please? Are we sure that "queue" can solve this? May be I'm missing the idea here still...

shcheklein avatar Jul 16 '22 16:07 shcheklein

could you give more details please? Are we sure that "queue" can solve this? May be I'm missing the idea here still...

When experiments are being run in the workspace there is always a clear indicator that something is running. We have 1 spinner always on display (status bar) and 2 spinners which may also be displayed (experiments tree and experiments webview). Whilst an experiment is running generally all three of these indicators will show some activity to the user.

When experiments are being run from the queue there is not always a clear indicator that something is running. Between one experiment finishing and the next starting there is no indication of activity from the extension. All three of the spinners mentioned above are stationary. dvc queue status shows that the next task has started to be processed but the exp show data does not reflect this. This was originally details in https://github.com/iterative/vscode-dvc/issues/1995#issuecomment-1183907303 but I clearly did not provided a detailed enough explanation.

mattseddon avatar Jul 17 '22 03:07 mattseddon

This might not be a problem introduced by the new queuing mechanism. I generally have stayed away from queuing/running experiments due to #828

mattseddon avatar Jul 17 '22 03:07 mattseddon

I don't like the idea of dealing with two terms "task" and "experiments". We should have one user-facing... the task and experiments are almost one to one correspondence, we can hide the task concept

Let's avoid term "task" for anything user-facing if possible. Ultimately tasks represent experiments so descriptive phrases like "queued experiment", "successful experiment run", etc. should do the trick.

jorgeorpinel avatar Jul 17 '22 21:07 jorgeorpinel

Prototype for adding queue worker information to the UI:

https://user-images.githubusercontent.com/37993418/179434079-f51dc8b4-3a61-4b3d-9062-383be123e2b8.mov

Note: Icon in the experiments table should be a spinner.

mattseddon avatar Jul 18 '22 01:07 mattseddon

Under the current implementation the stop button in the UI (shown below) will behave unexpectedly.

image

We will need to detect when the queue is being processed and show the appropriate actions (stop or kill) in the editor/title position.

mattseddon avatar Jul 18 '22 03:07 mattseddon

  • Whether or not to give users access to queue task logs from the UI.

The obvious next question to me is why I can see these logs only for "queued" experiments. How can we handle workspace experiments so that we don't have different behavior depending on how experiments are run?

  • Whether or not to show the created field for queue tasks.

Agree with @shcheklein that users probably don't care most of the time. The time that it was added to the queue is usually only important for incomplete experiments, in which case it's already shown in dvc exp show. If the experiment is already complete, it seems unimportant.

  • Whether or not to show the number of task workers and their status (idle or working). See

We now have the workspace as well as any number of task workers that could each be running experiments. How do we make it easy to (in both DVC and VS Code):

  1. See where experiments are being run?
  2. See the number of workers?
  3. Adjust the number of workers?
  4. Select where an experiment should be run?
  5. Stop a single experiment ?
  6. Stop only the the workspace or only the queue?

(see also https://github.com/iterative/vscode-dvc/issues/1996; sorry for the overlap and happy to move comments or collapse into one issue)

dberenbaum avatar Jul 18 '22 18:07 dberenbaum

  • Whether or not to give users access to queue task logs from the UI.

The obvious next question to me is why I can see these logs only for "queued" experiments. How can we handle workspace experiments so that we don't have different behavior depending on how experiments are run?

Current behaviour is that when an experiment is run in the workspace the output is sent directly into a terminal. I.e the red box in this screenshot:

image

Up until this point the queue logs would also be sent to the same place as the output of dvc exp run --run-all. Now that is changing I think it is ok to use the concept of foreground (workspace) vs background (queue) for running experiments.

If an experiment is running in the foreground (i.e the workspace) I would expect to monitor the logs as the experiment runs. I can do this through the terminal. I would probably also watch the plots live update through the plots webview.

If I am running experiments as a background process (i.e from the queue) I would only want to see the results of individual runs (logs) if something had gone wrong with the run. That's why I suggested the approach of making these accessible from the table.

If we want to take the same approach for experiments being run in the foreground then we would need some mechanism to save the logs for each run. Potentially the extension could take care of this but I would not mark it as a primary concern.

Given that experiments which have been run as background process will behave a little bit differently we will probably want to mark them differently in the experiments table. We can handle this in the extension but the CLI may want to consider providing this information so that users can easily look up the associated queue task. This probably goes back to adding the queue task id to the exp show data or making the logs accessible by using the experiment name/sha instead of the queue task one.

Please LMK if any of that doesn't make sense, if I am guilty of faulty thinking or if you have any suggestions.

We now have the workspace as well as any number of task workers that could each be running experiments. How do we make it easy to (in both DVC and VS Code):

  1. See where experiments are being run?
  2. See the number of workers?
  3. Adjust the number of workers?

I think we should add components to the UI in the following places:

image

The spinner icon in the webview can link directly to a quick pick which will let users manage the queue. We can add all of the same actions to the title context menu in the tree. Also, by listing individual workers in the tree we will be able to give users the ability to remove them (either in bulk or one at a time).

  1. Select where an experiment should be run?

Is it currently possible to run an experiment in the background without first adding it to the queue?

  1. Stop a single experiment ?

Is there a command for this?

  1. Stop only the the workspace or only the queue?

The extension can handle this. We will conditionally show actions in the UI dependent on where experiments are currently being run. The first step in being able to handle this situation is getting the queue worker information in a reliable way from the CLI.

My only other question would be: How likely is it that a user will be running experiments in both the foreground and background?

This and #1996 do now feel like the same issue. Can we collapse everything down?

mattseddon avatar Jul 19 '22 00:07 mattseddon

If we want to take the same approach for experiments being run in the foreground then we would need some mechanism to save the logs for each run. Potentially the extension could take care of this but I would not mark it as a primary concern.

Yup, I didn't intend to imply that it should be VS Code that should take care of it. I was thinking about whether DVC should treat workspace experiments like queue experiments.

It's not the most immediate need, but other experiment trackers keep logs for future reference, and we have had requests to do the same.

7. Select where an experiment should be run?

Is it currently possible to run an experiment in the background without first adding it to the queue?

No.

9. Stop a single experiment ?

Is there a command for this?

Yes, dvc queue kill.

My only other question would be: How likely is it that a user will be running experiments in both the foreground and background?

Probably not that common, although a user may want to run a one-off experiment without waiting for its turn in the queue.

dberenbaum avatar Jul 19 '22 19:07 dberenbaum

@karajan1001 @pmrowla can you confirm what is happening in the background/under the hood when a task is being picked up from the queue and turned into a running experiment? There is currently a long gap between starting the queue and anything showing up in the experiments table.

As demonstrated here: https://github.com/iterative/vscode-dvc/issues/1995#issuecomment-1183907303.

The reason that I am asking is that currently (IMO) we'll need to fill this gap by adding some extra information to the UI on the VS Code side. Would be good to avoid that.

Thanks

mattseddon avatar Aug 01 '22 18:08 mattseddon

The issue is that currently experiments only appear in exp show once the pipeline repro step has actually started. This is mainly due to how we can't collect any experiment data from the temp dir's git clone until the temp directory has been completely initialized/populated.

However, queue status will show that the task is running as soon as the initial setup steps have started (while the temp dir is still being initialized).

On the vscode side this could be reflected in the UI by adding a completely empty row with nothing but the ID/name field into the table (but we do not do this in exp show right now on the DVC side).

pmrowla avatar Aug 02 '22 05:08 pmrowla

On the vscode side this could be reflected in the UI by adding a completely empty row with nothing but the ID/name field into the table (but we do not do this in exp show right now on the DVC side).

Could the row have some way of showing that the row values are still "loading"?

dberenbaum avatar Aug 02 '22 19:08 dberenbaum

Suggestion from @mattseddon is to use the State column with something like 'Starting'. @pmrowla Thoughts?

dberenbaum avatar Aug 02 '22 20:08 dberenbaum

By the way, the row is already in the table, correct? The problem is that it continues to show up in the Queued state in dvc exp show after it shows as Running in dvc queue status?

dberenbaum avatar Aug 02 '22 21:08 dberenbaum

Suggestion from @mattseddon is to use the State column with something like 'Starting'. @pmrowla Thoughts?

Yes we can do this, but it will require some work on the core DVC side. Basically the issue is that the state column in exp show is still using pre-dvc queue hacks to determine what is running.

IMO what we ideally should be doing is figuring out how to properly handle the separation between "collecting git-committed experiment data/params/metrics" and "queue/task execution state", but if we need to continue shoving everything into exp show in the meantime we can still do that

pmrowla avatar Aug 03 '22 07:08 pmrowla