Logan Adams

Results 294 comments of Logan Adams

> Hi @loadams can you help start the workflow? The model checkpoint path had been moved to the persistent storage as suggested. Apologies, I was out but it should be...

> Hi @loadams I have added gptj and baichuan7b model to autotp workflow, can you help start the workflow? Thanks! Done. > Now this workflow is ready for testing autotp...

> > > Hi @loadams I have added gptj and baichuan7b model to autotp workflow, can you help start the workflow? Thanks! > > > > > > Done. >...

> Hi @loadams , I see the environment issue should have been fixed. Can you help restart the workflow? Thanks! @delock - yes, apologies that took so long.

> @loadams I ran these two tests on my local environment. It didn't took so long. Can you help run this workflow again to see whether it is reproducible? Thanks!...

> Hi @loadams, I tried run these UTs in my environment and didn't see this timeout. Since CPU UT is already covered by workflow `cpu-torch-latest`. I removed unit tests in...

cc: @jithunnair-amd and @rraminen - new issue opened because we closed the previous one. Once we merge the ROCm update to 5.6 PR I believe there are still failing tests,...

Hi @annopackage - can you share a full minimal repro script with us please?

@alvieirajr - were you able to validate that swapping these resolved your issues?

@liuhui0401 - this seems like a cuda error, or a bad state that the GPUs are in. If you power cycle the machine, does nvidia-smi work?