sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

Allow dependency numpy to be >= 2.0.0

Open lorenzwalthert opened this issue 1 year ago • 16 comments

Describe the feature you'd like In June 2024, numpy 2.0.0 was released. sagemaker-python-sdk depends on numpy>=1.9.0,<2.0. This creates a dependency hell for me, as I have dependencies in my python package that depend on numpy >= 2.0.0.

You could either enforce numpy >= 2.0.0 and make new releases of the package incompatible with numpy < 2.0.0 or keep supporting the currently supported numpy versions, but also add those >= 2.0.0. I.e. depending on whether or not there are breaking changes with numpy >= 2.0.0 in your code base, establish different code paths depending on the installed version of numpy.

How would this feature be used? Please describe.

Ensuring I can resolve dependencies in Python packages that have both this SDK as well as other dependencies with a requirement for numpy >= 2.0.

Describe alternatives you've considered

I don't think there is an alternative, as in the long run, this problem will get worse as more and more other packages depend on numpy >= 2.0.0. I am surprised no one has opened an issue until now.

Additional context I am also opening a support case with AWS Premium Support.

lorenzwalthert avatar Oct 02 '24 07:10 lorenzwalthert

Is there anything holding up removal of the pin? From a quick scan, I would think it is basically a few tiny replacements for things like np.NaN which ruff check path/to/code/ --select NPY201 --fix will just do. There are a few uses np.int which might be fishy if (and only if!) this code ever runs on windows (otherwise, it may be fishy, but there is no change in NumPy 2).

seberg avatar Oct 08 '24 15:10 seberg

@ellisonbg do you know who we should talk to about NumPy 2 support here?

jakirkham avatar Oct 25 '24 19:10 jakirkham

Any status on this when we could expect this support to be added?

victoriarouton avatar Nov 04 '24 17:11 victoriarouton

Any updates here? We're in the same situation.

radoshi avatar Dec 10 '24 18:12 radoshi

Hi @nargokul, I see this pr #4955 is merged. Would this be part of the next release?

aaravind100 avatar Dec 11 '24 09:12 aaravind100

Seems like pr https://github.com/aws/sagemaker-python-sdk/pull/4963 reverted the change for numpy. Was it intentional? @nargokul

wickeat avatar Dec 16 '24 08:12 wickeat

same

amorisot avatar Dec 19 '24 13:12 amorisot

@knikure and @zhaoqizqwang, you reviewed the PR. numpy compatibility is a big issue for many people here. Any idea?

lorenzwalthert avatar Dec 30 '24 15:12 lorenzwalthert

@nargokul Seems like numpy update PR was actively closed, with a currently open PR existing. Any blocker for the update?

wickeat avatar Jan 10 '25 03:01 wickeat

There are compatibility challenges in upgrading Amazon SageMaker Python SDK with NumPy 2.0 since pandas library currently lacks support for NumPy 2.0. We will incorporate the latest NumPy version once pandas provides the necessary support. Your patience is appreciated as we await this external dependency.

The update from Pandas has a caveat mentioned below in https://pandas.pydata.org/pandas-docs/stable/whatsnew/v2.2.2.html

One major caveat is that arrays created with numpy 2.0’s new StringDtype will convert to object dtyped arrays upon Series/DataFrame creation. Full support for numpy 2.0’s StringDtype is expected to land in pandas 3.0.

This causes regression in SageMaker PySDK functionalities and hence we will need to wait for pandas 3.0 to make this update.

Please reach out to our support team if you have any further inquiries.

mufaddal-rohawala avatar Jan 23 '25 02:01 mufaddal-rohawala

since pandas library currently lacks support for NumPy 2.0.

Pandas has reasonable support in newer versions since the first release of NumPy (version 2.2.2). There might be pandas versions which miss a numpy <2 pin making it possible to accidentally install incompatible versions.

But, unless you have another dependency that still enforces pandas<2.2.2 that should not stop you from updating.

The update from Pandas has a caveat mentioned below [StringDType not supported]

Sorry, but there seems to be a misunderstanding here: The new StringDType in NumPy should not be relevant. It is new and simply doesn't affect existing code. (If this somehow is a problem, it should be a 3rd party, i.e. one that immediately started using StringDType, dependency problem that I don't think you should worry about.)

Please reach out to our support team if you have any further inquiries.

@mufaddal-rohawala it would be helpful if there was an issue to discuss the exact problem. If this seems like a NumPy/pandas related difficulty NumPy and pandas maintainers are certainly here to help. Please do reach out to me (or pandas maintainers, e.g. with an issue there).

seberg avatar Jan 23 '25 08:01 seberg

+1

Yc-Chen avatar Mar 07 '25 16:03 Yc-Chen

@nargokul @mufaddal-rohawala Is there any update on this? Many (most) ML python packages have moved ahead with numpy>=2.0 support. This is causing serious conflicts with our projects. I am not sure if waiting for pandas 3.0 is the right approach here.

abdulfatir avatar Apr 15 '25 08:04 abdulfatir

Also, in early April, pandas seem to have released numpy 2 support as per https://pandas.pydata.org/docs/whatsnew/v2.2.2.html#pandas-2-2-2-is-now-compatible-with-numpy-2-0

humanzz avatar May 05 '25 13:05 humanzz

@humanzz April 2024, yes. That's no new information, it's already been stated in https://github.com/aws/sagemaker-python-sdk/issues/4882#issuecomment-2609136297.

lorenzwalthert avatar May 06 '25 07:05 lorenzwalthert

Gosh, when one reads 2024 as 2025, the brain makes up the weirdest of schedules :)

For a sec I thought the pandas release was in April 2025, and given the comment was from earlier, I assumed it was mentioning a prerelease!

My bad, for contributing nothing but noise here!

humanzz avatar May 06 '25 07:05 humanzz

Can we get some urgency on this? There's a handful of open CVE's that cannot be addressed in repositories that use both numpy and sagemaker.

https://www.cve.org/CVERecord?id=CVE-2025-0508 https://www.cve.org/CVERecord?id=CVE-2024-34072 https://www.cve.org/CVERecord?id=CVE-2024-34073

Getting bugged from our security folks on this constantly.

adam133 avatar Jul 17 '25 16:07 adam133

Would love to see numpy updated here -- as several have mentioned, most modern computational workflows now use numpy>=2 and things have been compatible with pandas for a solid while now. This causes major issues in dependency management.

QCaudron avatar Sep 09 '25 20:09 QCaudron

Is there any update on https://github.com/aws/sagemaker-python-sdk/pull/5199 @rsareddy0329

des1-gner avatar Sep 10 '25 01:09 des1-gner

It's almost a year since I opened the issue, and I just opened another support case (number 3 now). If any of you have aws premium support, maybe it helps if you also open a support case to keep pressing for a resolution. In any case, the 50+ upvotes of the initial post should help too. Thanks.

If you find this thread and have the same problem (but no new information), please just upvote the initial post instead of commenting +1 and similar to keep noise levels low. Thanks.

lorenzwalthert avatar Sep 10 '25 07:09 lorenzwalthert

Thanks everyone for your patience. We the PYSDK team are actively looking into finding a solution that works without breaking existing functionalities. We are hoping to resolve this by 9/19/2025. Will keep this thread updated .

nargokul avatar Sep 11 '25 00:09 nargokul

Hi @nargokul is there any update on if this is tracking? If not, can you please post an update. Cheers.

des1-gner avatar Sep 17 '25 22:09 des1-gner

It is now 9/20/2025 could we get a progress update.

des1-gner avatar Sep 19 '25 22:09 des1-gner

Any update??? Waiting for almost one year and no news.

galtamirano-klar avatar Sep 19 '25 22:09 galtamirano-klar

Thank you for the continued engagement and we are working on a resolution. The team is actively engaged to ensure all dependencies work across our SDK features and we will provide a status update on 9/25. We appreciate your patience and please continue to share any specific use cases or concerns in this thread.

rsareddy0329 avatar Sep 19 '25 23:09 rsareddy0329

Thank you for your continued patience. The team remains actively engaged on this issue. We are working toward a resolution soon and will continue to share updates here as the work advances.

Please continue to share any specific use cases or concerns in this thread.

rsareddy0329 avatar Sep 25 '25 17:09 rsareddy0329

Can you please explain to us why this is taking so long? Apparently it must be complicated, but I don't see where that complexity comes from exactly. There are people with significant contributions to numpy and pandas in this thread who offered their help in resolving this issue.

lorenzwalthert avatar Sep 26 '25 18:09 lorenzwalthert

Thank you for your continued patience. The team remains actively engaged on this issue. We are working toward a resolution soon and will continue to share updates here as the work advances.

Please continue to share any specific use cases or concerns in this thread.

Are we able to get another public update next week please?

I think the majority are mostly concerned with the length of time this has taken, the dependency conflicts this creates in applications, and that we're now using EOL numpy with some outstanding CVEs.

Can you please explain to us why this is taking so long? Apparently it must be complicated, but I don't see where that complexity comes from exactly. There are people with significant contributions to numpy and pandas in this thread who offered their help in resolving this issue.

@rsareddy0329 @nargokul are you able to share where the complexity comes from? From the error patterns, this appears to be downstream integration challenges rather than core sagemaker-python-sdk compatibility issues - is that accurate? Understanding the technical blockers would help the community better appreciate the timeline.

Thanks for the continued work on this.

des1-gner avatar Oct 04 '25 00:10 des1-gner

Apparently AWS Support wants to provide a progress update here but since this hasn’t really happened I think l it’s ok to share what I know. The problems are downstream container dependencies that need to be updated as well, and the progress is „predominantly positive“, whatever that means.

lorenzwalthert avatar Oct 04 '25 11:10 lorenzwalthert

Mind clarifying this? https://github.com/aws/sagemaker-python-sdk/pull/5199

It looks like all integration testing passed, and the above was merged into the main branch - would you mind confirming the timeline / next steps here?

It looks like you changed the numpy constraint from being pinned exactly to version 1.26.4 using ==1.26.4, to now supporting a range from 1.26.4 up to (but not including) 2.3.3 using >=1.26.4,<2.3.3, which includes numpy 2.x versions. Can you confirm you have fixed the dependency issues and all the updates to other packages (scipy, pandas, scikit-learn, tensorflow, etc.) are correct and working as intended?

des1-gner avatar Oct 10 '25 00:10 des1-gner