Export Fine-Tuned LLM after Trainer is Complete
We discussed here: https://github.com/kubeflow/website/pull/3718#issuecomment-2096619898 that our LLM Trainer doesn't export the fine-tuned model. So user can't re-use that model for inference or other purposes.
We should discuss how user can get the fine-tuned artifact after LLM Trainer is complete. /cc @kubeflow/wg-training-leads @deepanker13
Would be nice to see integration with Kubeflow Model Registry as well. cc @kubeflow/wg-data-leads
Would be nice to see integration with Kubeflow Model Registry as well. cc @kubeflow/wg-data-leads
If there is a tutorial of the part specific to this project that exhibit the metadata we want to capture on Model Registry, I would be very happy to complement that example with indexing those metadata on MR ! 🚀👍
@andreyvelich I may have misunderstood the initial context of this API because I was under the impression that you could serve the model once fine-tuned. Can you elaborate on this?
So user can't re-use that model for inference or other purposes.
@andreyvelich I may have misunderstood the initial context of this API because I was under the impression that you could serve the model once fine-tuned. Can you elaborate on this?
So user can't re-use that model for inference or other purposes.
I think, right now the only way is to use output_dir for model checkpoints.
In that case, user can get the model from PVC that we attach to the PyTorchJob.
Like in this example: https://github.com/kubeflow/training-operator/blob/master/examples/pytorch/language-modeling/train_api_hf_dataset.ipynb
Right @johnugeorge @deepanker13 ?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/remove-lifecycle stale
per https://github.com/kubeflow/training-operator/issues/2101#issuecomment-2097204327 is there a tutorial/demo about this, please?
I would be very happy to integrate a demo/blueprint for the documentation, I just need a "seed" to get started on the training operator :) thanks!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/remove-lifecycle stale
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/remove-lifecycle stale
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/remove-lifecycle stale we've been discussing different strategies to address this community need
- https://github.com/kubeflow/model-registry/issues/891
/lifecycle frozen
Currently, users can get the fine-tuned model from the PVC: https://www.kubeflow.org/docs/components/trainer/user-guides/builtin-trainer/torchtune/#get-the-fine-tuned-model. /close
@andreyvelich: Closing this issue.
In response to this:
Currently, users can get the fine-tuned model from the PVC: https://www.kubeflow.org/docs/components/trainer/user-guides/builtin-trainer/torchtune/#get-the-fine-tuned-model. /close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.