deepanker13

Results 16 comments of deepanker13

Issue → PyTorch profiler not capturing Dataloader time and runtime. Always shows 0. Code used → I have used the code given in official PyTorch profiler documentation ( [PyTorch documentation](https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html))...

@andreyvelich the links in fine-tuning.md are giving 404 page not found. Am I missing something?

> > @andreyvelich the links in fine-tuning.md are giving 404 page not found. Am I missing something? > > @deepanker13 Did you check these links via Website preview: https://deploy-preview-3718--competent-brattain-de2d6d.netlify.app/ ?...

> @johnugeorge @deepanker13 Do we need to create tracking issue with remaining items for Train/Fine-tune API for LLMs ? Okay I will create one

@StefanoFioravanzo I can help with the tutorial. Also do you have any reference for api documentation?

@tenzen-y I think environment variables like PET_RDZV_ENDPOINT, PET_RDZV_BACKEND etc get set for the containers only when we pass the elastic policy spec (https://github.com/kubeflow/training-operator/blob/0b6a30cd348e101506b53a1a176e4a7aec6e9f09/pkg/controller.v1/pytorch/envvar.go#L109). And the above mentioned environment variables are...

@andreyvelich since V2 implementation has started, should we take up the remaining tasks?

shall we rename it to kubeflow/model-dataset-downloader ? cc @andreyvelich @terrytangyuan