ixmp icon indicating copy to clipboard operation
ixmp copied to clipboard

Return categorical dtypes from (JDBC)Backend to reduce memory usage

Open zikolach opened this issue 5 years ago • 1 comments

Whenever possible switch to using categorical dtypes instead of object, at it allows to significantly reduce memory utilization when number of repeated values in a column is more than 50%. Pay extra attention as comparison of dataframes (e.g. in tests) is sensible to dtypes (e.g. order of the values in category). Here is an article providing more information about internal structures of dataframes in pandas.

zikolach avatar Nov 28 '19 09:11 zikolach

It should be decided whether this will be something that is:

  1. allowed by—that is, not specified by, but compatible with—the Backend API, and implemented by JDBCBackend, or
  2. specified as part of the Backend API, and then implemented by JDBCBackend.

The changes to the tests and documentation will differ depending on whether (1) or (2) is chosen.

khaeru avatar Nov 30 '19 12:11 khaeru