ml-commons
ml-commons copied to clipboard
[BUG] Unable to redeploy a model once undeployed on Windows
What is the bug? ML Commons: We have been seeing a bug on deploy/undeploy model due to Windows holding folder access while OpenSearch process is running.
- This is only a Windows isolated issue.
- It will prevent cleanup during undeploy and re-deployment on the same id.
- The folder will be released only when you stop the opensearch cluster.
How can one reproduce the bug? Steps to reproduce the behavior: 1) Register a model version
POST /_plugins/_ml/models/_register
{
"name": "huggingface/sentence-transformers/all-MiniLM-L12-v2",
"version": "1.0.1",
"model_format": "TORCH_SCRIPT"
}
## Response
{
"task_id": "FFe1cIkBT5fLQDwZTj7J",
"status": "CREATED"
}
2) Get model_id using the task_id above:
GET /_plugins/_ml/tasks/<task_id>
3) Deploy using the model_id
POST /_plugins/_ml/models/<model_id>/_deploy
4) Undeploy it successfully
POST /_plugins/_ml/models/<model_id>/_undeploy
5) Redeploy the same model using the same model ID. The model fails to get deployed
POST /_plugins/_ml/models/<model_id>/_deploy
What is the expected behavior? When a user undeploy a model and tries to re-deploy the same model, it should get deployed successfully.
What is your host/environment?
- OS: Windows
Do you have any screenshots? If applicable, add screenshots to help explain your problem.
Do you have any additional context? Add any other context about the problem.