Merlin
Merlin copied to clipboard
[Task] Action items from CI retrospective 2022-08-03
Problem:
Merlin team conducted this 5 why exercise to identify CI related pain points
- Build issues are found late in the release process
- Untracked changes to requirements and build processes of these pacakages are introduced ( not clear. discussion tracked here)
- As part of the release process, issues with upstream dependencies are not tracked
- RC builds of base container are not being used in the nightly
- PRs cannot be merged
- intermitent issue with the CI process
- Jenkins machine (uses the CI-nightly container) and yet behaves differently ( root cause not established clearly. Discussion tracked here
- Multi-stage example notebooks are broken
- PR checks don’t catch downstream issues. We don't currently have a way to run all unit tests when a merge is done at a repo.
- for instance when we do PR tests on core, we don’t test out that it works with nvt/models/systems (addressed)
- CI does not run unit test when someone pushes a PR to Systems because unit test is in Merlin repo (addressed)
- PR checks don’t catch downstream issues. We don't currently have a way to run all unit tests when a merge is done at a repo.
- Performance is not tracked in our release process
Goal:
Avoid last minute dependency hell by being proactive and tracking release risk at the start of the development cycle
Constraints:
Sub-tasks:
- [ ] CI Pipelines
- [ ] #588
- [ ] #589
- [ ] Container builds
- [x] https://github.com/NVIDIA-Merlin/Merlin/pull/558
- [ ] Core
- [x] https://github.com/NVIDIA-Merlin/core/pull/127
- [x] https://github.com/NVIDIA-Merlin/core/pull/129
- [ ] https://github.com/NVIDIA-Merlin/core/pull/131
- [ ] NVTabular
- [x] https://github.com/NVIDIA-Merlin/NVTabular/pull/1667
- [ ] https://github.com/NVIDIA-Merlin/NVTabular/pull/1671
I'm working on the "PR checks don't catch downstream issues" part of this