qmcpack icon indicating copy to clipboard operation
qmcpack copied to clipboard

[WIP] LCAO simple ompBLAS gemm

Open kgasperich opened this issue 2 years ago • 9 comments

Please review the developer documentation on the wiki of this project that contains help and requirements.

Proposed changes

Adding call to simple device-side gemm in LCAO routines to avoid data transfer

What type(s) of changes does this code introduce?

Delete the items that do not apply

  • Bugfix
  • New feature

Does this introduce a breaking change?

  • Yes
  • No

What systems has this change been tested on?

Checklist

Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is simply a reminder of what we are going to look for before merging your code.

  • Yes/No. This PR is up to date with current the current state of 'develop'
  • Yes/No. Code added or changed in the PR has been clang-formatted
  • Yes/No. This PR adds tests to cover any new code, or to catch a bug that is being fixed
  • Yes/No. Documentation has been added (if appropriate)

kgasperich avatar Dec 15 '23 21:12 kgasperich

I would still like to see the plumbing for rocblas, cublas etc. here with this as a fallback. Wouldn't the former just be faster in terms of getting to working accelerated code? If there is some quirk that makes this fallback option preferred it would be best to put the rationale in the PR comments. We need to minimize the amount of "standard numeric library" code that we write ourselves, so if we (you) are writing some we should be very clear about why.

prckent avatar Jan 18 '24 21:01 prckent

@prckent agreed. For the moment, getting the calculation on the GPU and sorting out data movement take precedence.

ye-luo avatar Jan 18 '24 21:01 ye-luo

Fair enough. Then the key thing is to make sure this has reasonable testing and that any parts of the API that are not implemented have aborts (hopefully none are needed, but we need to be pragmatic about the amount of effort.)

prckent avatar Jan 18 '24 21:01 prckent

Test this please

ye-luo avatar Feb 14 '24 23:02 ye-luo

Test this please

ye-luo avatar Feb 15 '24 00:02 ye-luo

Test this please

ye-luo avatar Feb 15 '24 04:02 ye-luo

@kgasperich please have a final pass to see if there is anything you would like to change before merging.

ye-luo avatar Feb 15 '24 05:02 ye-luo

I changed offload to a runtime option. This means all the ifdef still needs to be resolved. For now, non-offload is routed to the single walker fallback this is still not optimal but should work good enough.

ye-luo avatar Feb 15 '24 05:02 ye-luo

Test this please

ye-luo avatar Feb 15 '24 15:02 ye-luo

Test this please

ye-luo avatar Mar 28 '24 04:03 ye-luo