Ray Gao
Ray Gao
Hi @MauricioCafiero , to be clear do you mean this is happening inside a virtual environment created by conda or are you using some conda command?
lgtm as long as tests are fixed!
hi @lukaswittmann , thanks for reporting and providing the repro! we'll be looking into this asap and get back to you. I think this might be hitting an edge case...
A quick update on this one, this was confirmed as a bug with activation_checkpointing. working on a fix, but any inference mode that turns off activation_checkpointing will work for now,...
Quick update, this should be fixed now and we'll ship a new fairchem pypi release with this soon! Thank you for raising the issue!
Hi @hdj020402 , great question! we just rolled out farichemv2 and working on features like batch inference. After this [PR](https://github.com/facebookresearch/fairchem/pull/1153) lands, you will be able to easily to use the...
tracking PR: https://github.com/facebookresearch/fairchem/pull/1181
please update the readme and hf 🙏
> I have noticed that even with this fix, at random intervals, a dataloader worker will raise the following error and crash a training job: `malloc(): invalid next size (unsorted)`....
Hmm the resume should be working as we rely on it for long training runs, we also have some test cases that covers this: can you try running them? https://github.com/facebookresearch/fairchem/blob/main/tests/core/units/mlip_unit/test_mlip_unit.py#L477...