sagemaker-python-sdk
sagemaker-python-sdk copied to clipboard
Missing json error when trying to compile a semantic segmentation model (builtin algorithm) with Neo
Describe the bug
Not sure if this is a bug or an unsupported feature. We've trained a semantic segmentation model, using the built in sagemaker semantic segmentation algorithm, (FCN with resenet 50) and were able to successfully deploy it. But, we wanted to compile it with Neo in order to improve inference performance, and to be able to deploy it to an inf1 instance.
When I try to compile the model (based on examples in sample notebooks), I receive the following error:
ClientError: InputConfiguration: No valid Mxnet model file -symbol.json found
The model.tar.gz for semantic segmentation models contains hyperparams.json, model_algo-1, model_best.params. According to the docs, model_algo-1 is the serialized mxnet model. Aren't gluon models supported by Neo?
If not, can I manulay use gluon\mxnet to save the required symbols json from the serialized model in order to use Neo?
Thanks!
To reproduce Train a Semantic Segmentation model using sagemaker builtin algorithm, with FCN and resnet 50, and try to call the estimators compile_model.
Expected behavior Neo should successfully compile the model.
Screenshots or logs If applicable, add screenshots or logs to help explain your problem.
System information A description of your system. Please provide:
- SageMaker Python SDK version:
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): Semantic segmentation (builtin)
- Framework version:
- Python version: 3.6
- CPU or GPU: Running in a sagemaker jupyter notebook, hosted on a a CPU instance
- Custom Docker image (Y/N):
Additional context Add any other context about the problem here.
Hi @MCE-KobyBo
Sagemaker training stores the actual Mxnet model (*-symbol.json
and *.params
) in a non-standard way by zipping them into the model_algo-1
file - which is actually a zip file with no extension.
You can work around this by unzipping the model_algo-1
file and creating a new .tar.gz with the *-symbol.json
and *.params
files, which can be submitted to Neo for compilation.
@trevor-m Thanks for your reply. I read about that somewhere, but it doesn't seem to be the case here. The model.tar.gz file produced by the training job contains 3 files: hyperparams.json, model_algo-1 and model_best.params. model_algo-1 doesn't seem to be a zip file in this case, it has the same size and format as model_best.params.
They both seem to be exported gloucv model parameters. To test that, I've followed the sample code in this stackoverflow thread (but using FCN) and it worked, I was able to load it directly with load_params. This means that for Neo, maybe I'll have to manually export it.
Thanks for the response! It appears that sagemaker builtin models does not have a consistent format. While unzipping the model_algo-1 file worked for other Sagemaker builtin models such as LinearLearner, it appears it is a different internal format in this case. I would advise asking the Sagemaker builtin segmentation model team about how to extract the underlying model, or avoiding Sagemaker builtin models altogether.
@MCE-KobyBo Did you manage to solve this manually? I'm having similar issues.
@taroko-mooncake Unfortunately no, but as we didn't have time to continue trying we just decided not to use neo for now
@MCE-KobyBo I solved it - if you post the question on stackoverflow i can send to you
Hey @taroko-mooncake , I'm ready to post it on Stackoverflow in order to get a solution
Sure - send me the link to the question.
On Tue, 5 Oct 2021 at 08:49, korimarik @.***> wrote:
Hey @taroko-mooncake https://github.com/taroko-mooncake , I'm ready to post it on Stackoverflow in order to get a solution
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aws/sagemaker-python-sdk/issues/2062#issuecomment-934155340, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKV6JJR6TA63VA7GEKR2NWLUFKUXPANCNFSM4VOWSHRA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
@taroko-mooncake @korimarik Did you end up solving this? Was there a StackOverflow Q/A posted? Having same problem using built-in Semantic Segmentation model and SageMaker Neo
Yes I did solve it - if you post a SO QA I can send you the answer
@taroko-mooncake @korimarik Did you end up solving this? Was there a StackOverflow Q/A posted? Having same problem using built-in Semantic Segmentation model and SageMaker Neo
Thanks @taroko-mooncake appreciate the help. I have posted a StackOverflow Question here: [https://stackoverflow.com/questions/71579883/missing-symbol-json-error-when-trying-to-compile-a-sagemaker-semantic-segmentat]
I have the same issue with Linear Learner. The generated model.tar.gz has a file model.algo-1 which is a zip file. I was able to compile the model only after I unpacked the file and created a separate tar.gz file witn only one symbol.json file and one params file.