PyPSA
PyPSA copied to clipboard
Refactor: Generalize add component functions
Closes:
- #341
- #746
- #867
- #895
Changes proposed in this Pull Request
n.mremoveandn.addare depricated and merged inton.removeand `n.addn.import_components_from_dataframeandn.import_series_from_dataframeare deprecated (moved to internal function)
Todos:
- [ ] Test performance
Checklist
- [ ] Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in
doc. - [ ] Unit tests for new features were added (if applicable).
- [ ] Newly introduced dependencies are added to
environment.yaml,environment_docs.yamlandsetup.py(if applicable). - [ ] A note for the release notes
doc/release_notes.rstof the upcoming release is included. - [ ] I consent to the release of this PR's code under the MIT license.
The changes described above are now implemented.
As this is an API change, we should think about whether other changes in terms of better code completion/type hinting would make sense in the future. So that nothing needs to be deprecated again. Code completion/hinting could be added on two levels: To see what components exist and also what the defaults are.
A couple of options (feel free to come up with anything else):
EDIT:
1. Component container class
from pypsa import components
n.add(components.Generators(), names, **kwargs) # Instance or class, see below
Pros:
- Best backwards compatibility
- Most discoverability
- Most future-proof and consistent with a potential custom Component class (only optional)
- Custom components support
- Most pythonic (IMHO)
- We don't have to change anything now
Cons:
- More code to add something
- Maybe too many different ways to add something, which could be confusing?
2. Add class
n.add.generators(names, **kwargs)
Pros:
- Similar to the statistics module\
- Discoverability, but not for default values
Cons:
- Custom components are a bit more hacky
- Needs deprecation
- Least pythonic (IMHO)
3. Function per component
n.add_generators(names, **kwargs)
Pros:
- Easy to use and pythonic
- Discoverability, but not for default values
- Honestly, I don't see more
Cons:
- Custom components
- Needs deprecation
| Option 1 | Option 2 | Option 3 | |
|---|---|---|---|
| Backwards compatibility | Yes | No, only via old method | No, only via old method |
| Component discoverability | Yes | Yes | Yes |
| Component attribute discoverability | Yes | Yes | Yes |
| Component attribute default value discoverability | Yes | No | No |
Hi, i see this has been lying dormant. Sorry for not moving on it. Thoughts:
Must-haves:
- Stay backwards compatible for a depreciation period, ie.
n.add("Generator", ...)would have to continue to work. - We need to test performance of the change (maybe construct a sector network and measure the times)
Interface:
PyPSA is currently lacking discoverability of the actual component names, which is what you address above by what you call component hints i think; but even more so of the allowed attributes (ie. is it p_max_pu or s_max_pu or committable, comittable (for those of us not being native english speakers). ie. i typically need the component docs open in a browser tab, when i design my model. This means, it would be good if pylance or jedi could see these argument options and could show meaningful doc strings about them, which i think is impossible with option 1, and would need option 2 or 3. I am not a big fan of the capital letter component names, so i have a slight preference for option 3: net.add_generators (but i would use the plural). The drawback is the custom component system, which allows you to add new component types (not sure, if we autogenerated those methods, jedi or pylance would be able to show their doc strings).
Thanks for the proposal
If we are talking about api change, i also would like to try to gather everything up into objects eventually (ie. a Generators object, which holds the dataframes for static and dynamic attributes as well as adders).
net.generators -> net.generators.static (or df)
net.generators_t -> net.generators.varying (or pnl)
net.add("Generator", ...) -> net.generators.add(...)
But only to let it float once. Don't try to go there directly.
EDIT: It might be that a ComponentDataFrame could be a strategy (like the GeoDataFrame), but really not sure how much work this would require
Following up on this: I edited plural/ capital form for the options and summarized pros and cons with a table. So, see again the first post.
Yes, I mean discoverability with what I called component hints. There are actually three levels of discoverability: components, component attributes and component attribute default values. With the n.add() right now, we are lacking all of them. All three options allow for component and attribute discoverability. Option 1 also allows for default values, in a way.
With option 1 there are basically a couple of options on how to add components:
# Old way, no deprecation needed
n.add('generators', names='gen', p_set=100)
# Mix, no attr discoverability
from pypsa.components import Generators
n.add(Generators(), names='gen', p_set=100)
# We could discuss to if passing the instance or class is the better approach here. Maybe instance if now further attributes are assigned in n.add, and class if there are assigned in there.
# Full discoverability
generators = Generators(names='gen', p_set=100)
n.add(generators)
# To get default value
print(generators.p_nom)
This is only the add discussion. The second discussion is on how to construct component classes. With option 1 this would beautifully align with those API changes:
net.generators->net.generators.static(ordf)net.generators_t->net.generators.varying(orpnl)net.add("Generator", ...)->net.generators.add(...)
The issue with the GeoDataFrame similar approach is that we have too many dimensions for that: list of components, attributes, and optional snapshots. GeoDataFrames don't have that issue. And if we mix them in one data frame, we lose all the advantages of this in the first place (e.g. sticking to the fact that net.generators is a data frame, like right now, and not a custom class). So I think a custom class with the API shown above and the adding system like above would be the best way.
But this is a second discussion and does not need to be decided right now. Now it would be great to agree on one of the options to get this PR merged.
@lkstrp I really like the approach in here:
# Full discoverability
generators = Generators(names='gen', p_set=100)
n.add(generators)
It paves the way to the component class that @coroa mentioned. I am just wondering whether defaults should be stored in an extra sub-field, like
generators.defaults.p_nom
such that
generators.p_nom
returns the actual assigned value.
Regarding defaults, isn't it simpler and less overhead to just let generators.p_nom be the assigned value if something was assigned and the default value if nothing was assigned. Default values after something has been overassigned could still be checked by initiating a new empty class.
I move this discussion to a new issue #936
Hej, joining the discussion once more, i am not yet sure this is going fully in the correct direction.
Ok, here we define component classes and their init function allows full discoverability for the type of components that exist (which is classes in pypsa.components, and potentially external pypsa.contrib.components at some point in the far future), for the attributes which exist and also their defaults. Looks great so far.
What i am unsure about is the interface and data storage of the instances. For example: generators = Generators(names=["a", "b"], p_nom=50, p_max_pu=0.6)
Leading questions:
- Should
Generatorsbecome the new full data container for what we have now atnet.generatorsandnet.generators_t? (I don't think it can, because it at least currently would have to know aboutnet.snapshots) - If no (to question 1), then we might end up with two different classes called
Generatorseventually. Thegeneratorsinstance from above, and what is going to benet.generatorsafternet.add(generators). Is that a good idea? - If yes (to question 1): then you have to deal with the ambiguity whether an attribute look up,
generators.p_max_pu, is referring to a varying attribute, a static one or a default one (agreed that whether default or non-default, one could just return the actual value, so the latter one i is not soo important). - If yes (to question 1): There is a fundamental different between an input attribute like
p_nom,p_max_puor an output attribute likep_nom_optorp. Should we separate those out storage and interface-wise?
Ah, i did not scroll up to see that you updated your description. I don't understand how
net.add(Generators(...))
would give you any insight into discoverability of default values, which
net.add_generators(...)
could not give you?
I imagine both would have custom doc-strings and default values in their method definition (__init__ vs add_generators).
What am i missing?
I imagine both would have custom doc-strings and default values in their method definition (
__init__vsadd_generators).
Yes, and these default values are not hard-coded into the docstring but loaded from the csv, so they can't be discovered via it. We could think about dynamically updating __doc__ after instantiation, but not sure if this is a good idea. I can have a look.
Which means that n.add(Generators()) would allow no more discoverability than n.add_generators(). Only via the intermediate step generators = Generators() you could have a look at your defaults before adding them to the network:
generators = Generators(names='gen', p_set=100)
# Check your defaults
print(generators.p_nom)
# Add to network
n.add(generators)
But that leads to your questions, which raise a very good point.
1 Should
Generatorsbecome the new full data container for what we have now atnet.generatorsandnet.generators_t? (I don't think it can, because it at least currently would have to know aboutnet.snapshots)
For n.snapshots it can't know. Not sure if other attributes might be a problem? Linkports?
2 If no (to question 1), then we might end up with two different classes called
Generatorseventually. Thegeneratorsinstance from above, and what is going to benet.generatorsafternet.add(generators). Is that a good idea?
These do not have to be different classes, but two different instances. Assuming only n.snapshots is the problem at this stage, one way to keep them equal, is to just pass the snapshots. Either as an indexed Series/ DataFrame or via an extra snapshots argument. Right now there are also no networks with varying attributes without snapshots being defined.
3 If yes (to question 1): then you have to deal with the ambiguity whether an attribute look up,
generators.p_max_pu, is referring to a varying attribute, a static one or a default one (agreed that whether default or non-default, one could just return the actual value, so the latter one i is not soo important).
This ambiguity you have right now as well. We just change the mapping.
n.generators.p_max_pu -> n.generators.static.p_max_pu
n.generators_t.p_max_pu -> n.generators.varying.p_max_pu
Right now you could have in the static one just 1s for each component and in the varying one a series at the same time. If this is the right approach is another discussion.
To throw in some more ideas, it's even a question of whether we need the differentiation between varying and static at all. All attributes that can be both could simply either be indexed by n.snapshots or have a dimension. In Numpy this would vectorize beautifully. Meaning [1,2] * 2 == [1,2] * [2,2]. In Pandas we would have to trade the index somehow. But this is even more far away I guess.
4 If yes (to question 1): There is a fundamental different between an input attribute like
p_nom,p_max_puor an output attribute likep_nom_optorp. Should we separate those out storage and interface-wise?
Storage wise I would make it performance-dependent. Otherwise we could only set filters based on the core data passively. API wise something like following would then also be more consistent:
n.generators.in.static.p_max_pu
n.generators.out.static.p_nom_opt
All in all, this is a discussion that I have tried to move to #936. This PR just deprecates n.madd().
Codecov Report
Attention: Patch coverage is 69.44444% with 33 lines in your changes missing coverage. Please review.
Project coverage is 78.28%. Comparing base (
b8fbebf) to head (7ca92c9). Report is 23 commits behind head on master.
| Files | Patch % | Lines |
|---|---|---|
| pypsa/components.py | 67.53% | 19 Missing and 6 partials :warning: |
| pypsa/clustering/spatial.py | 50.00% | 6 Missing :warning: |
| pypsa/io.py | 89.47% | 2 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #896 +/- ##
==========================================
- Coverage 78.32% 78.28% -0.05%
==========================================
Files 22 22
Lines 4785 4799 +14
Branches 1015 1019 +4
==========================================
+ Hits 3748 3757 +9
- Misses 770 775 +5
Partials 267 267
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
True, right now, especially when overwriting certain components, it can be inconsistent.
What do you have in mind? Just hard-coded alphabetically? Or in order of when they were added? The first could be added here. The second is a bit trickier because of the two Import from Series and Import from Dataframe functions. I'd like to get rid of them at some point anyway, so it would make sense to add a consistent order then.
Either alphabetically or in the order of the components_attrs/buses.csv etc. However, the latter does not include additional columns like bus2, bus3 for links for example, which would go then to the end. So I would say alphabetical it is.
Either alphabetically or in the order of the components_attrs/buses.csv etc. However, the latter does not include additional columns like
bus2,bus3for links for example, which would go then to the end. So I would say alphabetical it is.
Based on the attributes order in components_attrs/buses.csv has a better usability, doesn't it? It is also less of a change to the current state. I just pushed a changed for that, it is not much.
I was also talking about the order of components (component names). Here alphabetical would make sense, but this will need some test rewrites as well. Or do you want to keep it as it is by last added? This just leads to issues if you overwrite components, because they are added again at the end (could also be fixed though).
@FabianHofmann @fneum So I would like to get a confirmation on:
- component attrs: sorted as in components_attrs, other attrs at the end
- component names: alphabetical
@lkstrp I like the ordering according to component_attrs it gives you more control (and we can think about better orders in the future). The only flaw it has is the bus2, bus3 columns. We should have them much more at the beginning, right after bus1. however we could solves this. there, is a special function for adding these on the fly update_linkports_component_attrs introduced in https://github.com/PyPSA/PyPSA/pull/669/files. when we adapt this function to meet sensible bus columns ordering then all is good.
@lkstrp I like the ordering according to component_attrs it gives you more control (and we can think about better orders in the future). The only flaw it has is the
bus2,bus3columns. We should have them much more at the beginning, right afterbus1. however we could solves this. there, is a special function for adding these on the flyupdate_linkports_component_attrsintroduced in https://github.com/PyPSA/PyPSA/pull/669/files. when we adapt this function to meet sensible bus columns ordering then all is good.
and the efficiencies, i.e. efficiency2, efficiency3
Ok!