text-generation-inference
                                
                                 text-generation-inference copied to clipboard
                                
                                    text-generation-inference copied to clipboard
                            
                            
                            
                        Falcon-40b-instruct deployment in SageMaker fails when using serial inference pipeline
System Info
Hi,
Deployment of Falcon-40b-instruct as SageMaker endpoint worked well for me when following this tutorial. However when I try to deploy the container as part of serial inference pipeline I'm getting the following errors:
#033[2m2023-06-16T12:57:41.398047Z#033[0m #033[31mERROR#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard 3 failed to start:
[W socket.cpp:601] [c10d] The client socket has failed to connect to [localhost]:29500 (errno: 99 - Cannot assign requested address).
You are using a model of type RefinedWeb to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
#033[2m2023-06-16T12:57:41.398097Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shutting down shards
#033[2m2023-06-16T12:57:41.432637Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard 2 terminated
#033[2m2023-06-16T12:57:42.131047Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard 0 terminated
Error: ShardCannotStart
Any idea what could cause this and how it can get resolved?
Thanks!
Information
- [X] Docker
- [ ] The CLI directly
Tasks
- [X] An officially supported command
- [ ] My own modifications
Reproduction
- Follow the steps in this manual: https://samuelabiodun.medium.com/how-to-deploy-a-pytorch-model-on-sagemaker-aa9a38a277b6
- Deploy the model as part of serial inference pipeline. https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipelines.html
- Model deployed fails with the errors presented above.
Expected behavior
Successful deployment.
The client socket has failed to connect to [localhost]:29500 (errno: 99 - Cannot assign requested address)
This seems like the issue. Could it be that something is already running on that port for instance ?
Yes - it is possible. Is there a way to override the default port value (29500)?
Yes: use the --master-port arg or the MASTER_PORT env var.
thx. I solved the port issue but the shards still can't start and I'm still seeing the following message in the logs:
You are using a model of type RefinedWeb to instantiate a model of type . This is not supported for all configurations of models and can yield
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.