QCSchema icon indicating copy to clipboard operation
QCSchema copied to clipboard

multipole storage

Open loriab opened this issue 5 years ago • 3 comments

No decision necessary for dipole because all 3 elements are unique, but for quadrupoles and higher one has to choose compact storage and defined order (e.g., xx, xy, xz, yy, yz, zz) or full representation (e.g., 9 element quadrupole storage). Former saves space but requires more management, which is hard to impose in schema as a data layout. I propose higher multipoles should be stored in full. For 64-poles, this is 729 elements redundant (28 unique). Any concerns or objections?

loriab avatar Mar 03 '20 19:03 loriab

Worth looking at http://www.openrsp.org/en/latest/index.html which (de facto) defines a schema for arbitrary response properties (arbitrary in terms of operator, order, and frequency).

mattwelborn avatar Mar 03 '20 20:03 mattwelborn

Thanks for the link! That's a great project to know about, and I'm reassured to see they went with redundant components as well http://www.openrsp.org/en/latest/tutorial/perturbations.html#perturbations.

My guess is that it should be easy to map but that qcsk doesn't want to go immediately with the more complex openrsp representation?

loriab avatar Mar 04 '20 17:03 loriab

I think lexicographical order is fairly common (e.g., http://cclib.github.io/data_notes.html#moments) and if programs use different order, it can be mapped easily.

As far as storage, I'd probably suggest storing in upper triangle / reduced form. Let's say that you store the 64-pole for all molecules in 3 million entries. It may be a small part of the whole, but it adds up quickly across a database IMHO.

I'm aways going to come from the compressed = good perspective. I'm trying to upload 22GB to Figshare right now and that's meaningful.

Importantly, I many programs I use do not output the full tensor for the same reason - much is redundant.

ghutchis avatar Sep 24 '20 16:09 ghutchis