ixmp
ixmp copied to clipboard
Return categorical dtypes from (JDBC)Backend to reduce memory usage
Whenever possible switch to using categorical dtypes instead of object, at it allows to significantly reduce memory utilization when number of repeated values in a column is more than 50%. Pay extra attention as comparison of dataframes (e.g. in tests) is sensible to dtypes (e.g. order of the values in category). Here is an article providing more information about internal structures of dataframes in pandas.
It should be decided whether this will be something that is:
- allowed by—that is, not specified by, but compatible with—the Backend API, and implemented by JDBCBackend, or
- specified as part of the Backend API, and then implemented by JDBCBackend.
The changes to the tests and documentation will differ depending on whether (1) or (2) is chosen.