Fix writing of default values
Description
In this PR I have fixed the writing out of default values through the existing --write_defaults flag. Accompanying tests have been added.
This functionality existed (and worked) for writing out parameter defaults, but did not for writing out result defaults. This was due to the input data not being passed into the WriteStrategy._expand_defaults(...) function. I have addressed this, however, naming of variables is not super clear now in the WriteStrategy class.
Within WriteStrategy, data to be written out is now called inputs (this can be set/param data or result data), while input data for the model (only param/set data) is now called input_data. idk, we may want to clarify this before merged, or it may just be fine since its only in one place and documented in the docstring of _expand_defaults(...) and at variable definition.
Issue Ticket Number
My suggestion is for this PR to replace PR #215
Documentation
The functionality was already documented, but it didnt work!
@trevorb1 - we have a performance problem. Creating new pandas dataframes with default values, concatenating the read-in values, and deleting duplicate values is slow and memory intensive. We need to consider alternatives.
- Do not store the entire dictionary of expanded dataframes in memory, but only generate them one-at-a-time upon writing out.
- Use xarray, which is a much more efficient way of storing data in-memory; and with very fast and easy methods to expand or fill multi-dimensional data. It also provides very easy conversion to and from pandas DataFrames e.g. see #184
@trevorb1 see 7849353 which refactors your expand_defaults code, ensuring that only one dataframe at a time is expanded and then immediately written out, minimising memory use. Note this breaks a lot of tests...
Thanks for the feedback, @willu47! I looked at this today and think your option 2 of using xarray will be the way forward.
I refactored the current code a little, just to make use of instance variables. My first thought of using pandas update actually made it slower, though. So that was a bummer haha. The implementation you suggested, @willu47, was faster than the current code!
This method still wont work with bigger models, though :( I was using OSeMSOYS Global to generate generic input data for these tests. What I found was that anything past a medium size model, and my computer just killed it after like 45sec (I have a mid-range laptop). imo, this makes the --write_defaults flag not that useful, as for more realistic models it probably wont work. (However, to be fair, I probably wouldnt run a large model on my laptop)
I will look at the xarray implementation to see if that works better! Thanks for the idea on that!
With the latest updates I have (following @willu47's suggestion here):
- Moved the expansion of dataframes from
WriteStrategytoReadStrategy - Retained @willu47's implementation of
expand_defaults, and updated tests to reflect this - Put in logic to always expand
DiscountRateandDiscountRateIdvdataframes when calculating results as per issue #217
Still to do before reviewing/merging this:
- [x] Results expansion of defaults does not work right now. Also write tests to flag this.
- [ ] Expansion of defaults does not work for datafiles. Also write tests to flag this.
I guess the scope of this PR has changed from "fixing" writing defaults to just refactoring it in otoole to address issue #217. Based on my comment above, seems like the actual fixing will really only come with the xarray implementation, as mentioned by @willu47 here (which relates to the open PR #184)
Okay, I think writing defaults has been moved to ReadStrategy and works now!
- @willu47 are you able to please take a quick look just to see if this lines up with how you envisioned writing defaults to be in the read strategy? :)
- @HauHe can you please try the issue you described in issue #220 to see if this fixes it.
A few notes:
- I don't think this is necessary the most "elegant" solution, however, I imagine write defaults will need to be re-implemented soon to address the performance issues anyways.
- There is a couple of odd spots where default values are seperatly found for params and results. However, as described in issue #183, this should get refactored soon anyways.
- If we merge this, I believe issues #220 and #217 should be closed. Potentially, this also addresses issue #215 as well, but that can be up for further discussion.
- The expansion of defaults for writing of datafiles does not work. I think we should just move that to its own issue though in an effort to get this merged (unless someone has a quick solution to that, that I didnt think of!). Also, I cant really think of a use case where someone would even want default values written out in the datafile.
Hi @trevorb1, I just ran a test with Utopia without having the DiscountRateIdv in the config file, but the default in the model file pointing to the DiscountRate I get results for CapitalInvestment. I haven't done any other testing (yet). But I'd say this should fix #220
Thanks a lot for your work!!
@HauHe Great! To confirm, your test was for a multi-year model, right? I cant remember if another issue arose for a single year model that relates to this. But maybe since that is more of an edge case, deserves its own issue ticket
@trevorb1, yes the model I used for testing (UTOPIA) has multiple years.
I ran a test with one of my models, and there is no data in the Capitalnvestment.csv. In this case, I have declared the DiscountRateIdv for one of the technologies. No matter the combination I did, for example, defaults to DiscountRate in the model file or the default value in the data file, there are no results for capital investment. We have similar difficulties (Shravan and I) in calculating the CapitalInvestment in Osemosys_Pul.p. However, we solved the issue with Sharavan much simpler; and maybe it can be simplified/enhanced. Happy to share it with you!. @trevorb1
I ran a test with one of my models, and there is no data in the
Capitalnvestment.csv.In this case, I have declared theDiscountRateIdvfor one of the technologies. No matter the combination I did, for example, defaults to DiscountRate in the model file or the default value in the data file, there are no results for capital investment. We have similar difficulties (Shravan and I) in calculating the CapitalInvestment in Osemosys_Pul.p. However, we solved the issue with Sharavan much simpler; maybe it can be simplified/enhanced. Happy to share it with you! @trevorb1
@trevorb1, thanks for yesterday's discussion. I tried to install the version from this branch without success; the CapitalInvestment was empty. The DiscountRateIdv in the data file looks like this (using otoole):
param default 0.089 : DiscountRate :=
RE1 0.05
;
param default 0.089 : DiscountRateIdv :=
RE1 CKGELCurb 0.02
;
For now, the easiest solution is to un-hash CC1 and the var CapitalInvestment, and the calculated investments are ok compared to my manual calculation.
Thanks for the the update @robertodawid! Do you have a minimal datafile + model you would be able to attach that I would be able to recreate this issue with?
@HauHe; @willu47; @robertodawid;
I am going to merge this PR, as I think the scope/topic has changed a couple times. Below are the major updates:
- Moved the expansion of dataframes from
WriteStrategytoReadStrategy - @willu47's implementation of
expand_defaultsto be more preformative DiscountRateandDiscountRateIdvis always expanded
As a bunch of discussion about how this relates to issue #217 and #220 has appeared in this thread, and I don't know if everything in those tickets has been addressed. I will move discussion to each of those threads individually.