otoole Fix writing of default values

Description

In this PR I have fixed the writing out of default values through the existing --write_defaults flag. Accompanying tests have been added.

This functionality existed (and worked) for writing out parameter defaults, but did not for writing out result defaults. This was due to the input data not being passed into the WriteStrategy._expand_defaults(...) function. I have addressed this, however, naming of variables is not super clear now in the WriteStrategy class.

Within WriteStrategy, data to be written out is now called inputs (this can be set/param data or result data), while input data for the model (only param/set data) is now called input_data. idk, we may want to clarify this before merged, or it may just be fine since its only in one place and documented in the docstring of _expand_defaults(...) and at variable definition.

Issue Ticket Number

My suggestion is for this PR to replace PR #215

Documentation

The functionality was already documented, but it didnt work!

Feb 01 '24 22:02 trevorb1

@trevorb1 - we have a performance problem. Creating new pandas dataframes with default values, concatenating the read-in values, and deleting duplicate values is slow and memory intensive. We need to consider alternatives.

Do not store the entire dictionary of expanded dataframes in memory, but only generate them one-at-a-time upon writing out.
Use xarray, which is a much more efficient way of storing data in-memory; and with very fast and easy methods to expand or fill multi-dimensional data. It also provides very easy conversion to and from pandas DataFrames e.g. see #184

Feb 07 '24 09:02 willu47

@trevorb1 see 7849353 which refactors your expand_defaults code, ensuring that only one dataframe at a time is expanded and then immediately written out, minimising memory use. Note this breaks a lot of tests...

Feb 07 '24 10:02 willu47

Thanks for the feedback, @willu47! I looked at this today and think your option 2 of using xarray will be the way forward.

I refactored the current code a little, just to make use of instance variables. My first thought of using pandas update actually made it slower, though. So that was a bummer haha. The implementation you suggested, @willu47, was faster than the current code!

This method still wont work with bigger models, though :( I was using OSeMSOYS Global to generate generic input data for these tests. What I found was that anything past a medium size model, and my computer just killed it after like 45sec (I have a mid-range laptop). imo, this makes the --write_defaults flag not that useful, as for more realistic models it probably wont work. (However, to be fair, I probably wouldnt run a large model on my laptop)

I will look at the xarray implementation to see if that works better! Thanks for the idea on that!

Feb 13 '24 06:02 trevorb1

With the latest updates I have (following @willu47's suggestion here):

Moved the expansion of dataframes from WriteStrategy to ReadStrategy
Retained @willu47's implementation of expand_defaults, and updated tests to reflect this
Put in logic to always expand DiscountRate and DiscountRateIdv dataframes when calculating results as per issue #217

Still to do before reviewing/merging this:

[x] Results expansion of defaults does not work right now. Also write tests to flag this.
[ ] Expansion of defaults does not work for datafiles. Also write tests to flag this.

I guess the scope of this PR has changed from "fixing" writing defaults to just refactoring it in otoole to address issue #217. Based on my comment above, seems like the actual fixing will really only come with the xarray implementation, as mentioned by @willu47 here (which relates to the open PR #184)

Mar 31 '24 18:03 trevorb1

Okay, I think writing defaults has been moved to ReadStrategy and works now!

@willu47 are you able to please take a quick look just to see if this lines up with how you envisioned writing defaults to be in the read strategy? :)
@HauHe can you please try the issue you described in issue #220 to see if this fixes it.

A few notes:

I don't think this is necessary the most "elegant" solution, however, I imagine write defaults will need to be re-implemented soon to address the performance issues anyways.
There is a couple of odd spots where default values are seperatly found for params and results. However, as described in issue #183, this should get refactored soon anyways.
If we merge this, I believe issues #220 and #217 should be closed. Potentially, this also addresses issue #215 as well, but that can be up for further discussion.
The expansion of defaults for writing of datafiles does not work. I think we should just move that to its own issue though in an effort to get this merged (unless someone has a quick solution to that, that I didnt think of!). Also, I cant really think of a use case where someone would even want default values written out in the datafile.

Apr 01 '24 01:04 trevorb1

Hi @trevorb1, I just ran a test with Utopia without having the DiscountRateIdv in the config file, but the default in the model file pointing to the DiscountRate I get results for CapitalInvestment. I haven't done any other testing (yet). But I'd say this should fix #220 Thanks a lot for your work!!

Apr 02 '24 13:04 HauHe

@HauHe Great! To confirm, your test was for a multi-year model, right? I cant remember if another issue arose for a single year model that relates to this. But maybe since that is more of an edge case, deserves its own issue ticket

Apr 02 '24 16:04 trevorb1

@trevorb1, yes the model I used for testing (UTOPIA) has multiple years.

Apr 02 '24 16:04 HauHe

I ran a test with one of my models, and there is no data in the Capitalnvestment.csv. In this case, I have declared the DiscountRateIdv for one of the technologies. No matter the combination I did, for example, defaults to DiscountRate in the model file or the default value in the data file, there are no results for capital investment. We have similar difficulties (Shravan and I) in calculating the CapitalInvestment in Osemosys_Pul.p. However, we solved the issue with Sharavan much simpler; and maybe it can be simplified/enhanced. Happy to share it with you!. @trevorb1

Jul 30 '24 14:07 robertodawid

I ran a test with one of my models, and there is no data in the Capitalnvestment.csv. In this case, I have declared the DiscountRateIdv for one of the technologies. No matter the combination I did, for example, defaults to DiscountRate in the model file or the default value in the data file, there are no results for capital investment. We have similar difficulties (Shravan and I) in calculating the CapitalInvestment in Osemosys_Pul.p. However, we solved the issue with Sharavan much simpler; maybe it can be simplified/enhanced. Happy to share it with you! @trevorb1

@trevorb1, thanks for yesterday's discussion. I tried to install the version from this branch without success; the CapitalInvestment was empty. The DiscountRateIdv in the data file looks like this (using otoole):

param default 0.089 : DiscountRate :=
RE1 0.05
;
param default 0.089 : DiscountRateIdv :=
RE1 CKGELCurb 0.02
;

For now, the easiest solution is to un-hash CC1 and the var CapitalInvestment, and the calculated investments are ok compared to my manual calculation.

Jul 31 '24 15:07 robertodawid

Thanks for the the update @robertodawid! Do you have a minimal datafile + model you would be able to attach that I would be able to recreate this issue with?

Jul 31 '24 18:07 trevorb1

@HauHe; @willu47; @robertodawid;

I am going to merge this PR, as I think the scope/topic has changed a couple times. Below are the major updates:

Moved the expansion of dataframes from WriteStrategy to ReadStrategy
@willu47's implementation of expand_defaults to be more preformative
DiscountRate and DiscountRateIdv is always expanded

As a bunch of discussion about how this relates to issue #217 and #220 has appeared in this thread, and I don't know if everything in those tickets has been addressed. I will move discussion to each of those threads individually.

Sep 21 '24 18:09 trevorb1