vscode-dvc
vscode-dvc copied to clipboard
Ability to attach debugger to any code within DVC pipeline
With a complicated DVC pipeline, with dynamic parametrized dependencies it's not easy to get an exact command that is needed to run a specific stage under debugger outside of DVC. On the other hand, users compare our experiments with a regular Notebook or even basic scripts workflow. They don't know anymore how to pause and explore a data frame.
We need to research and find on the DVC side or on the extension side a way to mitigate this.
- https://github.com/iterative/vscode-dvc/issues/722
- https://github.com/iterative/vscode-dvc/issues/657
- https://github.com/iterative/vscode-dvc/blob/main/README.md#your-dvc-project
This is not automated at all but it is a solution that works:
- Install
debugpyinto the virtual environment. - Add breakpoints to the script you want to debug.
- Add the required code to your script (see "Additional code")
- Add
attachconfiguration tolaunch.json(see "launch.json entry") - Run experiment
- Hit F5
- Profit.
Additional code:
import debugpy
debugpy.listen(("localhost", 6666))
debugpy.wait_for_client()
launch.json entry:
{
"version": "0.2.0",
"configurations": [
{
"name": "Debug experiment",
"type": "python",
"request": "attach",
"justMyCode": false,
"subProcess": true,
"port": 6666
}
]
}
Demo:
https://github.com/iterative/vscode-dvc/assets/37993418/a609dac2-57de-4829-b54f-03002a3e0761
I can make a tutorial and we can add it to the README/dvc.org if we think that would be useful.
Discussed with @skshetry that we might want to clarify the scope of this. Is it strictly about using IDE debugger tools? It might be worth clarifying this when publishing anything about it. It could give the wrong impression that adding breakpoints to your code won't work when running in DVC, and I don't think we should assume that the typical data scientist is familiar with debugging tools.
As an advanced (I hope) DVC user, It will be awesome to have the ability to run DVC as an "app", like a flask server or Spring application. I think the latter is the exact dream.
Today, we have a huge run_experiment.py script that is dealing with different dvc.yaml in the same repo, and needs to handle many use cases of params.yaml. Of course, everything is terminal based and the suggestion above is not something that can really scale and introduced to DS teams.
Maybe I'm going into another issue that we have, but once we can have the DVC "app" ability, maybe we can have annotations for StepInput and StepOutput to standartize the contracts between steps (make it more object orient and not file orient). I guess this can be the next level, since today is a heavy .yaml engineering.
Side note: we need to see if the same can be achieved in Pycharm (I expect it to be very similar, but it's been a while since I was touching it)