[BUG] Issue in running Merlin Containers on quick deploy
Bug description
https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-training?quick-deploy=false . After deploying and opening the jupyterlab it got stuck in the loading kernel step. This applies to Merlin training and Merlin Pytorch training. Both of them had this issue.
< To be updated >
Steps/Code to reproduce bug
Expected behavior
Environment details
- Merlin version:
- Platform:
- Python version:
- PyTorch version (GPU?):
- Tensorflow version (GPU?):
Additional context
@benfred , please update the status for this ticket. Can this be closed? if not, does this qualify for P0 ?
in the email thread, this was the last communication from Shokoufeh Monjezi Kouchak
Clone this bug https://nvbugs/3597709 . In the bug, explain that you have these containers already published, you want to add quick deploy feature to them. And ask them to add these containers to vertexAI workbench collection after adding quick deploy.
@bschifferer @rnyak can one of you validate that our vertex AI workbench workflow is working. @nv-alaiacano @benfred We need a better way of tracking whether this is up to date and working or not.
Is it correct that merlin-training is deprecated in favor of merlin-tensorflow, merlin-pytorch, etc?
Yeah, we used to have to split them by training and inference for container size reasons.
On Mon, Oct 17, 2022, 6:08 p.m. Adam Laiacano @.***> wrote:
Is it correct that merlin-training is deprecated in favor of merlin-tensorflow, merlin-pytorch, etc?
— Reply to this email directly, view it on GitHub https://github.com/NVIDIA-Merlin/Merlin/issues/447#issuecomment-1281681601, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGF5LTJWQWSL5M5PUCGPODDWDX2A5ANCNFSM53M42DSA . You are receiving this because you commented.Message ID: @.***>
@bbozkaya has filed an NVBug to get the container to deploy with content rather than empty. Waiting on NGC team.
I re-contacted NGC team to remind them about this issue/bug. I hope they can resolve it.
Containers are updated to 22.10; older containers are removed. Notebook collections are also updated to the latest versions with additional notebooks included. Notebooks appear in a folder in the corresponding container when quick deployed from NGC to Vertex AI. Notebooks are tested and they run without issues.