mendeleev icon indicating copy to clipboard operation
mendeleev copied to clipboard

Reduce import time

Open paulromano opened this issue 1 year ago • 6 comments

Is your feature request related to a problem? Please describe.

First off, thanks a lot for developing this wonderful package! I'm interested in using it as a dependency for other Python projects that I manage. One thing that makes me a little hesitant is that the import time is a bit on the long side. For example, on my system:

> time python -c 'import mendeleev'

real	0m2.076s
user	0m2.404s
sys	0m1.253s

Compare this to other common packages:

> time python -c 'import scipy'

real	0m0.191s
user	0m0.540s
sys	0m1.021s

> time python -c 'import pandas'

real	0m0.333s
user	0m0.695s
sys	0m1.224s

If I pick up mendeleev as a dependency, one unfortunate side effect is that my packages will inherit that slow start up time too.

Describe the solution you'd like

Any change that reduces the import time would be great. I assume this is entirely related to loading the database and so I don't know how much of it is "inherent" and difficult to change. One possible solution is to defer loading the database until the point at which it's needed.

Describe alternatives you've considered

For me as a nuclear scientist/engineer, I am primarily interested in pulling in mendeleev for atomic masses, although I may use it for other pieces of data as well. The main alternative for me is to not use mendeleev and instead have my own cooked up version of an atomic_mass function (along with AME data).

paulromano avatar Jun 26 '23 18:06 paulromano

Hey Paul, thanks for reporting this and explaining your use case. This is a known issue and a consequence of a few design choices made earlier. The module that causes long import times is mendeleev.elements where Elementinstances are queried for all elements to enable the import shorthand

from mendeleev import Ag, F
print(Ag.name, F.atomic_mass)

This isn't optimal since now there a few relations on the SQL side that need to be traversed on init to make it work.

Maybe you could try commenting out this line https://github.com/lmmentel/mendeleev/blob/3d14699b460919ad4441b6fbf65ae8570730f1fe/mendeleev/init.py#L7 and checking if the import time are reduced sufficiently for your use case?

If that works we could move the elements module import to be optional and not the default as it is now.

BTW are you accessing data in bulk i.e. reading a properties into a dataframe or element by element?

lmmentel avatar Jun 27 '23 22:06 lmmentel

Thanks for the quick response @lmmentel! Commenting out that line definitely does the trick:

> time python -c 'import mendeleev'

real	0m0.416s
user	0m0.739s
sys	0m1.011s

For my use case, I would probably read the atomic masses in bulk into a dictionary internally that I could pull from later as needed. So, that means I'll still pay the price for loading all the data, but I'm OK with that as long as that load time only happens when atomic masses are actually needed.

paulromano avatar Jun 28 '23 03:06 paulromano

Glad I could help. I think it might be worth making this a permanent change to reduce the import time for all users. That would however necessitate some changes in the docs and probably in the test suite. I'm a bit short on time to look further into this but happy to help if you are interested in giving it a try.

In case you haven't seen it, there are methods for bulk data access that might be worth looking at. Here's a tutorial.

lmmentel avatar Jun 28 '23 20:06 lmmentel

Thanks @lmmentel. I may take a stab at this when I get a chance. Your code is well-structured so that definitely helps! :smile:

paulromano avatar Jun 28 '23 21:06 paulromano

Hey @paulromano any chance of reviving this one?

lmmentel avatar Mar 18 '24 20:03 lmmentel

Related to #135

lmmentel avatar Mar 18 '24 20:03 lmmentel