Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

[BUG]AttributeError: module 'numpy' has no attribute 'product'

Open XuanofXXX opened this issue 8 months ago • 3 comments

Describe the bug When calling

 File "~/megatron/core/dist_checkpointing/validation.py", line 460, in _validate_sharding_for_key_flattened
[rank0]:     or stops[-1] != np.product(local_shape)

it will release an error AttributeError: module 'numpy' has no attribute 'product' whereas my numpy version is 2.2.5(stable)

To Reproduce Using numpy==2.2.5 will reproduce.

Expected behavior Assert error

Stack trace/logs [rank0]: Traceback (most recent call last): [rank0]: File "~/myprojct/scripts/../sft_gpt.py", line 311, in [rank0]: pretrain( [rank0]: File "~/myprojct/megatron/training/training.py", line 356, in pretrain [rank0]: iteration, num_floating_point_operations_so_far = train( [rank0]: File "~/myprojct/megatron/training/training.py", line 1416, in train [rank0]: save_checkpoint_and_time(iteration, model, optimizer, [rank0]: File "~/myprojct/megatron/training/training.py", line 1096, in save_checkpoint_and_time [rank0]: save_checkpoint(iteration, model, optimizer, opt_param_scheduler, [rank0]: File "~/myprojct/megatron/training/checkpointing.py", line 436, in save_checkpoint [rank0]: async_save_request = dist_checkpointing.save(state_dict, checkpoint_name, save_strategy, [rank0]: File "~/myprojct/megatron/core/dist_checkpointing/serialization.py", line 368, in save [rank0]: sharded_state_dict, state_dict = save_preprocess(sharded_state_dict, validate_access_integrity) [rank0]: File "~/myprojct/megatron/core/dist_checkpointing/state_dict_transformation.py", line 49, in save_preprocess [rank0]: validate_sharding_integrity(determine_global_metadata(sharded_part)[1]) [rank0]: File "~/myprojct/megatron/core/dist_checkpointing/validation.py", line 395, in validate_sharding_integrity [rank0]: _validate_sharding_for_key(shardings) [rank0]: File "~/myprojct/megatron/core/dist_checkpointing/validation.py", line 424, in _validate_sharding_for_key [rank0]: map_reduce( [rank0]: File "~/myprojct/megatron/core/dist_checkpointing/dict_utils.py", line 244, in map_reduce [rank0]: res[k] = reduce_fn(res[k]) [rank0]: File "~/myprojct/megatron/core/dist_checkpointing/validation.py", line 460, in _validate_sharding_for_key_flattened [rank0]: or stops[-1] != np.product(local_shape) [rank0]: File "~/.venv/lib/python3.10/site-packages/numpy/init.py", line 414, in getattr [rank0]: raise AttributeError("module {!r} has no attribute " [rank0]: AttributeError: module 'numpy' has no attribute 'product'

Environment (please complete the following information):

  • Megatron-LM commit ID, I don't know which commit ID my env using, but I check current version of megatron still using np.product
  • PyTorch version '2.4.1+cu124'
  • CUDA version 12.4
  • NCCL version 2.20.5

Proposed fix Change the np.product method to np.prod method. Or at the beginning of the file, write:

# Define np.product if it doesn’t exist
if not hasattr(np, "product"):
    np.product = np.prod

If there’s a specific reason to retain np.product, please let me know and I’ll take a closer look—thanks!

XuanofXXX avatar May 05 '25 03:05 XuanofXXX

Run into the same issue! Numpy only has the function as np.prod instead of np.product. Which version of numpy is the np.product available in?

lishuai-97 avatar May 11 '25 08:05 lishuai-97

np.product is deprecated since v1.25 (https://github.com/numpy/numpy/pull/23314) and is removed in v2.0. Downgrading to numpy<2 solves the issue.

roosephu avatar May 14 '25 22:05 roosephu

np.product is deprecated since v1.25 (numpy/numpy#23314) and is removed in v2.0. Downgrading to numpy<2 solves the issue.

Thank you, this worked for me!

lishuai-97 avatar May 17 '25 01:05 lishuai-97

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Jul 16 '25 18:07 github-actions[bot]

+1 same issue, will we support np 2.0?

fzyzcjy avatar Jul 19 '25 14:07 fzyzcjy

same issue

gxy-gxy avatar Aug 28 '25 07:08 gxy-gxy

same issue

same

zui-jiang avatar Aug 28 '25 07:08 zui-jiang

Hey all, sorry for the late response. Mcore is heavily tied to the NGC PyTorch container, which is currently compiled with numpy 1.2.6. The benefit of that container is that it comes with libraries like FlashAttention and TransformerEngine. We might extend compatibility beyond PyT with future releases, but for now we recommend using that container for highest compatibility.

Please also have a look at our pyproject.toml for supported dependency versions.

Let me know if there’s more I can help with!

ko3n1g avatar Aug 28 '25 08:08 ko3n1g

@ko3n1g NGC PyTorch 25.10 includes numpy==2.1.0. We should reconsider numpy 2.0 support

sbhavani avatar Oct 31 '25 20:10 sbhavani