elpa issue?
Describe the bug
I wonder what is wrong in the error reporting. I have tried to use various version ABACUS, but all showed this error. When I moved to another server, no error happened.
Expected behavior
No response
To Reproduce
Here are my input and ouput files.
ABACUS version: commit fe3e4b313db1858d62c2cd6b92102d887c8abe65 (2023.07.23)
task.002.tar.gz
Environment
No response
Additional Context
No response
@AroundPeking thanks for your report. In order to exclude possible cause from omp code, could you please try to run the job again with OMP_NUM_THREADS=1?You just need to add this before mpirun in your job submission script. Please let us know the results.
The same error with OMP_NUM_THREADS=1.
@caic99 Would you please look into this issue?
@caic99 Would you please look into this issue?
@hongriTianqi Sorry, I have no ideas on it.
@AroundPeking Has this issue been solved? If not, what is suggested for improving the performance of ABACUS later? Thank you.
@AroundPeking
- What's your environment in building ABACUS?
- Have you tried toolchain method?
- [x] Verify the issue is not a duplicate.
- [x] Describe the bug.
- [ ] Steps to reproduce.
- [ ] Expected behavior.
- [ ] Error message.
- [ ] Environment details.
- [ ] Additional context.
- [ ] Assign a priority level (low, medium, high, urgent).
- [ ] Assign the issue to a team member.
- [ ] Label the issue with relevant tags.
- [ ] Identify possible related issues.
- [ ] Create a unit test or automated test to reproduce the bug (if applicable).
- [ ] Fix the bug.
- [ ] Test the fix.
- [ ] Update documentation (if necessary).
- [ ] Close the issue and inform the reporter (if applicable).
@AroundPeking Has this issue been solved? If not, what is suggested for improving the performance of ABACUS later? Thank you.
@hongriTianqi I have not solved it yet. In fact, it seems related to my building environment since one of my working sever works normally with intel20u4 but another sever works wrong now with oneapi2022.3, while other softwares are same.
And I found it strange that it works right when I just change the STRU from a larger system of 400+ atoms in task.002.tar.gz to a smaller one of 50+ atoms in test.zip.
@AroundPeking
- What's your environment in building ABACUS?
- Have you tried toolchain method?
@QuantumMisaka
-
here my environment: loaded
gcc11.2,oneapi22.3,miniconda3,cmake3.26.3my building:I am not sure whether I answer you clearly. You can ask me for more detail files.
-
No, I have no idea with toolchain method.
@AroundPeking Thanks. There is another issue related to size limit #2002 and #2044, that might be related to this problem.
At which stage did your task failed? I tested it on my machine and the test task survived for more than 2 scf cycles. The test abacus is compiled with gcc-13.2.0, elpa-2023.05.001, openblas-0.3.23, openmpi-4.1.1 and fftw-3.3.10. It seems you uses elpa-2021.11.001 and intel compiler. May you give a try on a newer version of elpa with your remaining environment?
We try v3.7.5, and find this error disappears.