cml
cml copied to clipboard
When should we update dvc.lock and dvc push in GitHub Actions?
Discussed in https://github.com/iterative/dvc/discussions/6542
Originally posted by Hongbo-Miao September 6, 2021 Currently, my GitHub Actions workflow looks like this: when I open a pull request (change some model codes / params), CML creates a AWS EC2 instance, and DVC pull the data.
Here is my current GitHub Actions workflow:
Click to expand!
cml-cloud-set-up-cloud:
name: CML (Cloud) - Set up cloud
runs-on: ubuntu-20.04
steps:
- name: Cancel previous runs
uses: styfle/[email protected]
with:
access_token: ${{ github.token }}
- name: Checkout
uses: actions/checkout@v2
- name: Set up CML
uses: iterative/setup-cml@v1
- name: Set up cloud
shell: bash
env:
REPO_TOKEN: ${{ secrets.CML_ACCESS_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: |
cml-runner \
--cloud=aws \
--cloud-region=us-west-2 \
--cloud-type=t2.small \
--labels=cml-runner
cml-cloud-train:
name: CML (Cloud) - Train
needs: cml-cloud-set-up-cloud
runs-on: [self-hosted, cml-runner]
# container: docker://iterativeai/cml:0-dvc2-base1-gpu
container: docker://iterativeai/cml:0-dvc2-base1
steps:
- name: Cancel previous runs
uses: styfle/[email protected]
with:
access_token: ${{ github.token }}
- name: Checkout
uses: actions/checkout@v2
- name: Set up Miniconda
uses: conda-incubator/setup-miniconda@v2
with:
miniconda-version: "latest"
activate-environment: hm-cnn
- name: Install requirements
working-directory: convolutional-neural-network
shell: bash -l {0}
run: |
conda install pytorch torchvision torchaudio --channel=pytorch
conda install pandas
conda install tabulate
pip install -r requirements.txt
- name: Pull Data
working-directory: convolutional-neural-network
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: |
dvc pull
- name: Train model
working-directory: convolutional-neural-network
shell: bash -l {0}
env:
WANDB_API_KEY: ${{ secrets.WANDB_API_KEY }}
run: |
dvc repro
- name: Write CML report
working-directory: convolutional-neural-network
shell: bash -l {0}
env:
REPO_TOKEN: ${{ secrets.CML_ACCESS_TOKEN }}
run: |
echo "# CML (Cloud) Report" >> report.md
echo "## Params" >> report.md
cat output/reports/params.txt >> report.md
cml-send-comment report.md
My dvc.yaml looks like this:
stages:
prepare:
cmd: tar -xf data/raw/cifar-10-python.tar.gz --dir=data/processed
deps:
- data/raw/cifar-10-python.tar.gz
outs:
- data/processed/cifar-10-batches-py/
main:
cmd: python main.py
deps:
- data/processed/cifar-10-batches-py/
- evaluate.py
- main.py
- model/
- train.py
params:
- lr
- train.epochs
outs:
- output/models/model.pt
After training, if I think the change is good because the performance is better based on the reports,
- the dvc.lock I feel needs to get update.
- the new model
model.ptneeds to be uploaded to AWS S3 in my case.
My question is, after dvc repro, am I supposed to add dvc push and then commit? Something like
- name: Train model
working-directory: convolutional-neural-network
shell: bash -l {0}
env:
WANDB_API_KEY: ${{ secrets.WANDB_API_KEY }}
run: |
dvc repro
dvc push # New added
git add . # New added
git commit -m "update dvc.lock, etc." # New added
git push origin current_pr # New added, need somehow get the current pull request name
This above method will apply when the pull request is open. However, I kind of feeling the best moment adding would be when I decide merging because I think this is a good pull request that actually improves the machine learning performance. But at this moment, the EC2 instance has been destroyed.
Any suggestion? Thanks!
See previous discussion comments:
https://github.com/iterative/dvc/discussions/6542#discussioncomment-1287600
Just copy another comment to here:
I am more curious about the recommended way for this demo https://github.com/iterative/cml_cloud_case/blob/master/.github/workflows/cml.yaml as it is using AWS EC2 instance. However, in current GitHub action workflow, it does not do something like
dvc push
cml-pr .
In the experiment pull request, it also not update the dvc.lock, etc. files, which is why I came up this question. (If we use AWS EC2 instance all time based on this approach, it will end with dvc.lock never got update.
Would be great to have some recommendations! 😃
To be more specific, it would be great to cover these scenarios and still not mess up the DVC:
- Opened a pull request, and the experiment result looks good, then merge.
- Opened a pull request, and the experiment result does not look good, do a force push to update. In second time, the experiment looks good, then merge.
- Opened a pull request, and the experiment result does not look good, add another commit to update. In second time, the experiment looks good, then squash merge.
Another option that I was taught to do was use DVC's run-cache.
In the CML runner:
dvc repro
dvc push --run-cache
On local machine
dvc pull --run-cache
dvc repro --pull
# any further checks or analysis of results
git add .
git commit -m "commit experiment"
This is less automated than the new cml-pr but has the benefit that the developers are making the commits, if that is important.