chgnet icon indicating copy to clipboard operation
chgnet copied to clipboard

Build against NumPy 2.0

Open DanielYang59 opened this issue 1 year ago • 6 comments

Summary

  • Build against NumPy 2.0, https://numpy.org/doc/stable/dev/depending_on_numpy.html
  • Ruff NPY201 rule run seems to pass (https://numpy.org/doc/stable/numpy_2_0_migration_guide.html):

    Many of the changes covered in the 2.0 release notes and in this migration guide can be automatically adapted in downstream code with a dedicated Ruff rule, namely rule NPY201.

  • [ ] Need to wait for pymatgen to fix NumPy 1 compatibility on Windows https://github.com/materialsproject/pymatgen/pull/3992
  • test_wandb_log_frequency use temporary path to avoid creating files locally

DanielYang59 avatar Aug 05 '24 02:08 DanielYang59

@janosh Can you please approve the workflow? Thanks!

DanielYang59 avatar Aug 05 '24 03:08 DanielYang59

Thanks! I believe the test failure is owing to the lack of support of NumPy 2.0 from pymatgen's side https://github.com/materialsproject/pymatgen/issues/3953. Not sure what's the best practice at this moment as pymatgen is waiting for chgnet to support NumPy 2.0 too https://github.com/materialsproject/pymatgen/pull/3894#issuecomment-2190098821 (perhaps use np2 for build system and np1 for runtime dependency?)

DanielYang59 avatar Aug 05 '24 11:08 DanielYang59

Pymatgen should go first and ignore failing chgnet tests

janosh avatar Aug 05 '24 12:08 janosh

Yes I just realized the same, pymatgen should build against np2 first :) And it would be compatible with np1 at runtime

DanielYang59 avatar Aug 05 '24 12:08 DanielYang59

@janosh It looks like you have to approve the workflow again for some reason?

DanielYang59 avatar Aug 09 '24 03:08 DanielYang59

until your first PR to a repo is merged, workflows need to be re-approved on every commit. it's quite annoying

janosh avatar Aug 09 '24 11:08 janosh

Hi @janosh if i remember correctly, this cannot be merged until a new release of pymatgen is issued.

DanielYang59 avatar Sep 10 '24 12:09 DanielYang59

@DanielYang59 thanks for the reminder. pymatgen v2024.9.10 should be on PyPI in a few minutes

janosh avatar Sep 10 '24 13:09 janosh

@janosh Have to ping you to approve the workflow (again), hopefully no test fails this time

DanielYang59 avatar Sep 11 '24 10:09 DanielYang59

@janosh Should be good this time :)

DanielYang59 avatar Sep 11 '24 12:09 DanielYang59

Looks like something is still not right with int type for windows (with pymatgen 2024.9.10), changing to my Windows PC for debugging...

____________________ ERROR collecting tests/test_model.py _____________________
tests\test_model.py:14: in <module>
    graph = CrystalGraphConverter()(structure, graph_id="test-model")
C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\torch\nn\modules\module.py:1511: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
C:\hostedtoolcache\windows\Python\3.12.5\x64\Lib\site-packages\torch\nn\modules\module.py:1520: in _call_impl
    return forward_call(*args, **kwargs)
chgnet\graph\converter.py:135: in forward
    graph = self.create_graph(
chgnet\graph\converter.py:255: in _create_graph_fast
    nodes, dir_edges_list, undir_edges_list, undirected_edges = make_graph(
chgnet\graph\cygraph.pyx:65: in chgnet.graph.cygraph.make_graph
    const long[::1] center_index,
E   ValueError: Buffer dtype mismatch, expected 'const long' but got 'long long'

DanielYang59 avatar Sep 11 '24 12:09 DanielYang59

based on the numpy docs for intp_t, maybe we should change the type signature to this?

def make_graph(
    const intp_t[::1] center_index,
    const intp_t n_e,
    const intp_t[::1] neighbor_index,
    const intp_t[:, ::1] image,
    const float64_t[::1] distance,
    const intp_t num_atoms
):

prev

def make_graph(
    const long[::1] center_index,
    const long n_e,
    const long[::1] neighbor_index,
    const long[:, ::1] image,
    const double[::1] distance,
    const long num_atoms
):

janosh avatar Sep 11 '24 13:09 janosh

Yes! Changed to numpy/cython type, tests seem to pass locally on my Windows PC

Note though, intp_t/float64_t is not standard type but from numpy, also intp_t is platform dependent.

Test NP1 on windows passes:

$ pip freeze | grep numpy
numpy==1.26.0
(venv)
yang@Yang-NUC MINGW64 /d/chgnet (numpy2)
$ pytest tests/test_graph.py
============================================================================ test session starts ============================================================================ 
platform win32 -- Python 3.12.0, pytest-8.3.3, pluggy-1.5.0
rootdir: D:\chgnet
configfile: pyproject.toml
plugins: cov-5.0.0
collected 8 items

tests\test_graph.py ........                                                                                                                                           [100%] 

============================================================================= 8 passed in 2.55s =============================================================================

DanielYang59 avatar Sep 11 '24 13:09 DanielYang59

I would revert 45a1fd75dcbb622a4d472d5004371cd1a87b6d69, numpy is needed to cythonize and cannot be imported lazily. I believe the problem is:

https://github.com/CederGroupHub/chgnet/blob/73d6096790bbb1e20d5f2e1e23119cca34e54483/.github/workflows/test.yml#L35-L41

As build is using a isolate environment (PEP-517)

DanielYang59 avatar Sep 11 '24 14:09 DanielYang59

Looks like there is another error just on windows for some reason:

____________________ ERROR collecting tests/test_model.py _____________________
chgnet\graph\converter.py:146: in forward
    bond_graph, undirected2directed = graph.line_graph_adjacency_list(
chgnet\graph\graph.py:272: in line_graph_adjacency_list
    raise ValueError(
E   ValueError: Error: number of directed edges=672 != 2 * number of undirected edges=473!This indicates directed edges are not complete
The above exception was the direct cause of the following exception:
tests\test_model.py:14: in <module>
    graph = CrystalGraphConverter()(structure, graph_id="test-model")
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\torch\nn\modules\module.py:1553: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\torch\nn\modules\module.py:1562: in _call_impl
    return forward_call(*args, **kwargs)
chgnet\graph\converter.py:153: in forward
    raise RuntimeError(
E   RuntimeError: Failed creating bond graph for test-model, check bond_graph_error.cif

Would have a look tomorrow.

DanielYang59 avatar Sep 11 '24 15:09 DanielYang59

Also there is another seemingly error for Ubuntu/MacOS, not sure if this is related to this PR:

tests/test_encoders.py .................
!!!!!! the two directed edges are equal but this operation is not supposed to happen
!!!!!! the two directed edges are equal but this operation is not supposed to happen
tests/test_graph.py ........

DanielYang59 avatar Sep 11 '24 15:09 DanielYang59

maybe @BowenD-UCB can chime on this test failure which oddly only happens on Windows? perhaps this signals that one of the dtype changes in make_graph is still platform dependent or needs to be reverted?

chgnet\graph\converter.py:148: in forward
    bond_graph, undirected2directed = graph.line_graph_adjacency_list(
chgnet\graph\graph.py:272: in line_graph_adjacency_list
    raise ValueError(
E   ValueError: Error: number of directed edges=672 != 2 * number of undirected edges=473!This indicates directed edges are not complete

janosh avatar Sep 11 '24 16:09 janosh

maybe @BowenD-UCB can chime on this test failure which oddly only happens on Windows? perhaps this signals that one of the dtype changes in make_graph is still platform dependent or needs to be reverted?

chgnet\graph\converter.py:148: in forward
    bond_graph, undirected2directed = graph.line_graph_adjacency_list(
chgnet\graph\graph.py:272: in line_graph_adjacency_list
    raise ValueError(
E   ValueError: Error: number of directed edges=672 != 2 * number of undirected edges=473!This indicates directed edges are not complete

@DanielYang59 @janosh Yes I'll look into this later. Sorry about the delay on the PR, been a bit busy lately.

bowen-bd avatar Sep 11 '24 19:09 bowen-bd

perhaps this signals that one of the dtype changes in make_graph is still platform dependent or needs to be reverted?

Concur, we should focus on inferred (implicit) types, i.e. np.array([0, 1]) whose type would still be platform dependent.

Instead of just changing the type in tests, we should also update the code where int is expected and automatically cast the type to int64. Otherwise users might still experience issues if they don't explicitly declare types.

However, I'm not familiar with the code base so if @BowenD-UCB could give a hand that would very helpful.

Sorry about the delay on the PR, been a bit busy lately.

It has not been delayed at all :) pymatgen just unblocked this.

DanielYang59 avatar Sep 12 '24 02:09 DanielYang59

@janosh I reverted all (I believe) numpy related change from this PR, and relocate NumPy v2 migration to another PR as it turns out there's more debugging needed than I expected.

Might need to replace the labels for this PR

DanielYang59 avatar Sep 12 '24 04:09 DanielYang59

No problem, thanks for merging this.

DanielYang59 avatar Sep 12 '24 14:09 DanielYang59