Allow dependency numpy to be >= 2.0.0
Describe the feature you'd like
In June 2024, numpy 2.0.0 was released. sagemaker-python-sdk depends on numpy>=1.9.0,<2.0. This creates a dependency hell for me, as I have dependencies in my python package that depend on numpy >= 2.0.0.
You could either enforce numpy >= 2.0.0 and make new releases of the package incompatible with numpy < 2.0.0 or keep supporting the currently supported numpy versions, but also add those >= 2.0.0. I.e. depending on whether or not there are breaking changes with numpy >= 2.0.0 in your code base, establish different code paths depending on the installed version of numpy.
How would this feature be used? Please describe.
Ensuring I can resolve dependencies in Python packages that have both this SDK as well as other dependencies with a requirement for numpy >= 2.0.
Describe alternatives you've considered
I don't think there is an alternative, as in the long run, this problem will get worse as more and more other packages depend on numpy >= 2.0.0. I am surprised no one has opened an issue until now.
Additional context I am also opening a support case with AWS Premium Support.
Is there anything holding up removal of the pin? From a quick scan, I would think it is basically a few tiny replacements for things like np.NaN which ruff check path/to/code/ --select NPY201 --fix will just do.
There are a few uses np.int which might be fishy if (and only if!) this code ever runs on windows (otherwise, it may be fishy, but there is no change in NumPy 2).
@ellisonbg do you know who we should talk to about NumPy 2 support here?
Any status on this when we could expect this support to be added?
Any updates here? We're in the same situation.
Hi @nargokul, I see this pr #4955 is merged. Would this be part of the next release?
Seems like pr https://github.com/aws/sagemaker-python-sdk/pull/4963 reverted the change for numpy. Was it intentional? @nargokul
same
@knikure and @zhaoqizqwang, you reviewed the PR. numpy compatibility is a big issue for many people here. Any idea?
@nargokul Seems like numpy update PR was actively closed, with a currently open PR existing. Any blocker for the update?
There are compatibility challenges in upgrading Amazon SageMaker Python SDK with NumPy 2.0 since pandas library currently lacks support for NumPy 2.0. We will incorporate the latest NumPy version once pandas provides the necessary support. Your patience is appreciated as we await this external dependency.
The update from Pandas has a caveat mentioned below in https://pandas.pydata.org/pandas-docs/stable/whatsnew/v2.2.2.html
One major caveat is that arrays created with numpy 2.0’s new StringDtype will convert to object dtyped arrays upon Series/DataFrame creation. Full support for numpy 2.0’s StringDtype is expected to land in pandas 3.0.
This causes regression in SageMaker PySDK functionalities and hence we will need to wait for pandas 3.0 to make this update.
Please reach out to our support team if you have any further inquiries.
since pandas library currently lacks support for NumPy 2.0.
Pandas has reasonable support in newer versions since the first release of NumPy (version 2.2.2). There might be pandas versions which miss a numpy <2 pin making it possible to accidentally install incompatible versions.
But, unless you have another dependency that still enforces pandas<2.2.2 that should not stop you from updating.
The update from Pandas has a caveat mentioned below [StringDType not supported]
Sorry, but there seems to be a misunderstanding here: The new StringDType in NumPy should not be relevant. It is new and simply doesn't affect existing code.
(If this somehow is a problem, it should be a 3rd party, i.e. one that immediately started using StringDType, dependency problem that I don't think you should worry about.)
Please reach out to our support team if you have any further inquiries.
@mufaddal-rohawala it would be helpful if there was an issue to discuss the exact problem. If this seems like a NumPy/pandas related difficulty NumPy and pandas maintainers are certainly here to help. Please do reach out to me (or pandas maintainers, e.g. with an issue there).
+1
@nargokul @mufaddal-rohawala Is there any update on this? Many (most) ML python packages have moved ahead with numpy>=2.0 support. This is causing serious conflicts with our projects. I am not sure if waiting for pandas 3.0 is the right approach here.
Also, in early April, pandas seem to have released numpy 2 support as per https://pandas.pydata.org/docs/whatsnew/v2.2.2.html#pandas-2-2-2-is-now-compatible-with-numpy-2-0
@humanzz April 2024, yes. That's no new information, it's already been stated in https://github.com/aws/sagemaker-python-sdk/issues/4882#issuecomment-2609136297.
Gosh, when one reads 2024 as 2025, the brain makes up the weirdest of schedules :)
For a sec I thought the pandas release was in April 2025, and given the comment was from earlier, I assumed it was mentioning a prerelease!
My bad, for contributing nothing but noise here!
Can we get some urgency on this? There's a handful of open CVE's that cannot be addressed in repositories that use both numpy and sagemaker.
https://www.cve.org/CVERecord?id=CVE-2025-0508 https://www.cve.org/CVERecord?id=CVE-2024-34072 https://www.cve.org/CVERecord?id=CVE-2024-34073
Getting bugged from our security folks on this constantly.
Would love to see numpy updated here -- as several have mentioned, most modern computational workflows now use numpy>=2 and things have been compatible with pandas for a solid while now. This causes major issues in dependency management.
Is there any update on https://github.com/aws/sagemaker-python-sdk/pull/5199 @rsareddy0329
It's almost a year since I opened the issue, and I just opened another support case (number 3 now). If any of you have aws premium support, maybe it helps if you also open a support case to keep pressing for a resolution. In any case, the 50+ upvotes of the initial post should help too. Thanks.
If you find this thread and have the same problem (but no new information), please just upvote the initial post instead of commenting +1 and similar to keep noise levels low. Thanks.
Thanks everyone for your patience. We the PYSDK team are actively looking into finding a solution that works without breaking existing functionalities. We are hoping to resolve this by 9/19/2025. Will keep this thread updated .
Hi @nargokul is there any update on if this is tracking? If not, can you please post an update. Cheers.
It is now 9/20/2025 could we get a progress update.
Any update??? Waiting for almost one year and no news.
Thank you for the continued engagement and we are working on a resolution. The team is actively engaged to ensure all dependencies work across our SDK features and we will provide a status update on 9/25. We appreciate your patience and please continue to share any specific use cases or concerns in this thread.
Thank you for your continued patience. The team remains actively engaged on this issue. We are working toward a resolution soon and will continue to share updates here as the work advances.
Please continue to share any specific use cases or concerns in this thread.
Can you please explain to us why this is taking so long? Apparently it must be complicated, but I don't see where that complexity comes from exactly. There are people with significant contributions to numpy and pandas in this thread who offered their help in resolving this issue.
Thank you for your continued patience. The team remains actively engaged on this issue. We are working toward a resolution soon and will continue to share updates here as the work advances.
Please continue to share any specific use cases or concerns in this thread.
Are we able to get another public update next week please?
I think the majority are mostly concerned with the length of time this has taken, the dependency conflicts this creates in applications, and that we're now using EOL numpy with some outstanding CVEs.
Can you please explain to us why this is taking so long? Apparently it must be complicated, but I don't see where that complexity comes from exactly. There are people with significant contributions to numpy and pandas in this thread who offered their help in resolving this issue.
@rsareddy0329 @nargokul are you able to share where the complexity comes from? From the error patterns, this appears to be downstream integration challenges rather than core sagemaker-python-sdk compatibility issues - is that accurate? Understanding the technical blockers would help the community better appreciate the timeline.
Thanks for the continued work on this.
Apparently AWS Support wants to provide a progress update here but since this hasn’t really happened I think l it’s ok to share what I know. The problems are downstream container dependencies that need to be updated as well, and the progress is „predominantly positive“, whatever that means.
Mind clarifying this? https://github.com/aws/sagemaker-python-sdk/pull/5199
It looks like all integration testing passed, and the above was merged into the main branch - would you mind confirming the timeline / next steps here?
It looks like you changed the numpy constraint from being pinned exactly to version 1.26.4 using ==1.26.4, to now supporting a range from 1.26.4 up to (but not including) 2.3.3 using >=1.26.4,<2.3.3, which includes numpy 2.x versions. Can you confirm you have fixed the dependency issues and all the updates to other packages (scipy, pandas, scikit-learn, tensorflow, etc.) are correct and working as intended?