Dice icon indicating copy to clipboard operation
Dice copied to clipboard

Question on DICE/SHCI

Open vvp-nsk opened this issue 1 year ago • 3 comments

Hi!

It is about missed documentation. The three different Davidson's algorithms are implemented:

davidsonType {DIRECT, DISK, MEMORY}

Could you please shed light on these algorithms? By default, the 'MEMORY' (all-in-ram?) algorithm is used. What about the rest two? Is it safe to use either 'DISK' (disk based?) or 'DIRECT' algorithm in production? If so, would it reduce RAM demands per core?

Thank you in advance!

With best regards, Victor

vvp-nsk avatar May 25 '23 08:05 vvp-nsk

Are you trying to do very large calculations? Maybe you can give Direst and disk options a try. At one point they were working but we have not actively tested them. If it works for a small problem (lets say a variational space of 1 million determinants while using mpi) then it did not break and should work.

Sandeep.

On Thu, May 25, 2023 at 1:04 AM vvp-nsk @.***> wrote:

Hi!

It is about missed documentation. The three different Davidson's algorithms are implemented:

davidsonType {DIRECT, DISK, MEMORY}

Could you please shed light on these algorithms? By default, the 'MEMORY' (all-in-ram?) algorithm is used. What about the rest two? Is it safe to use either 'DISK' (disk based?) or 'DIRECT' algorithm in production? If so, would it reduce RAM demands per core?

Thank you in advance!

With best regards, Victor

— Reply to this email directly, view it on GitHub https://github.com/sanshar/Dice/issues/12, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABVW4CSXOZVU4YFZZMXQSTXH4HANANCNFSM6AAAAAAYOOUGKI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

sanshar avatar May 25 '23 16:05 sanshar

Hi again!

The problem size is about 14M dets. Apparently, the MPI master (rank=0) process requires more (~2x) RAM compared to the rest of slaves processes. To make such calculations possible, I can spawn only one DICE process per single node while it allocates all RAM available within the compute node. Since DICE lacks of shared-memory parallelization, then SHCI calculation becomes very costly in terms of cores-hours charged to the project account. By this, I am looking for a way to reduce memory demands per MPI process to utilize more cores.

If it works for a small problem (lets say a variational spaceof 1 million determinants while using mpi) then it did not break and should work.

I will check it out.

Thank you!

With best regards, Victor

vvp-nsk avatar May 26 '23 09:05 vvp-nsk

In Dice we make use of shared memory, so it does not replicate all things on all cores in a node. So this is somewhat close to how things would be done using multithreading. But indeed with 14 m determinants there is a chance that it runs out of memory on a usual 64 to 100 Gb node.

Sandeep.

On Fri, May 26, 2023 at 2:15 AM vvp-nsk @.***> wrote:

Hi again!

The problem size is about 14M dets. Apparently, the MPI master (rank=0) process requires more (~2x) RAM compared to the rest of slaves processes. To make such calculations possible, I can spawn only one DICE process per single node while it allocates all RAM available within the compute node. Since DICE lacks of shared-memory parallelization, then SHCI calculation becomes very costly in terms of cores-hours charged to the project account. By this, I am looking for a way to reduce memory demands per MPI process to utilize more cores.

If it works for a small problem (lets say a variational spaceof 1 million determinants while using mpi) then it did not break and should work.

I will check it out.

Thank you!

With best regards, Victor

— Reply to this email directly, view it on GitHub https://github.com/sanshar/Dice/issues/12#issuecomment-1564078641, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABVW4HCHSTFHHFHFTBD7TLXIBYE5ANCNFSM6AAAAAAYOOUGKI . You are receiving this because you commented.Message ID: @.***>

sanshar avatar May 26 '23 15:05 sanshar