PyPSA icon indicating copy to clipboard operation
PyPSA copied to clipboard

Refactor: Generalize add component functions

Open lkstrp opened this issue 1 year ago • 11 comments

Closes:

  • #341
  • #746
  • #867
  • #895

Changes proposed in this Pull Request

  • n.mremove and n.add are depricated and merged into n.remove and `n.add
  • n.import_components_from_dataframe and n.import_series_from_dataframe are deprecated (moved to internal function)

Todos:

  • [ ] Test performance

Checklist

  • [ ] Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in doc.
  • [ ] Unit tests for new features were added (if applicable).
  • [ ] Newly introduced dependencies are added to environment.yaml, environment_docs.yaml and setup.py (if applicable).
  • [ ] A note for the release notes doc/release_notes.rst of the upcoming release is included.
  • [ ] I consent to the release of this PR's code under the MIT license.

lkstrp avatar May 17 '24 14:05 lkstrp

The changes described above are now implemented.

As this is an API change, we should think about whether other changes in terms of better code completion/type hinting would make sense in the future. So that nothing needs to be deprecated again. Code completion/hinting could be added on two levels: To see what components exist and also what the defaults are.

A couple of options (feel free to come up with anything else):

EDIT:

1. Component container class

from pypsa import components
n.add(components.Generators(), names, **kwargs) # Instance or class, see below

Pros:

  • Best backwards compatibility
  • Most discoverability
  • Most future-proof and consistent with a potential custom Component class (only optional)
  • Custom components support
  • Most pythonic (IMHO)
  • We don't have to change anything now

Cons:

  • More code to add something
  • Maybe too many different ways to add something, which could be confusing?

2. Add class

n.add.generators(names, **kwargs)

Pros:

  • Similar to the statistics module\
  • Discoverability, but not for default values

Cons:

  • Custom components are a bit more hacky
  • Needs deprecation
  • Least pythonic (IMHO)

3. Function per component

n.add_generators(names, **kwargs)

Pros:

  • Easy to use and pythonic
  • Discoverability, but not for default values
  • Honestly, I don't see more

Cons:

  • Custom components
  • Needs deprecation
Option 1 Option 2 Option 3
Backwards compatibility Yes No, only via old method No, only via old method
Component discoverability Yes Yes Yes
Component attribute discoverability Yes Yes Yes
Component attribute default value discoverability Yes No No

lkstrp avatar May 22 '24 13:05 lkstrp

Hi, i see this has been lying dormant. Sorry for not moving on it. Thoughts:

Must-haves:

  1. Stay backwards compatible for a depreciation period, ie. n.add("Generator", ...) would have to continue to work.
  2. We need to test performance of the change (maybe construct a sector network and measure the times)

Interface: PyPSA is currently lacking discoverability of the actual component names, which is what you address above by what you call component hints i think; but even more so of the allowed attributes (ie. is it p_max_pu or s_max_pu or committable, comittable (for those of us not being native english speakers). ie. i typically need the component docs open in a browser tab, when i design my model. This means, it would be good if pylance or jedi could see these argument options and could show meaningful doc strings about them, which i think is impossible with option 1, and would need option 2 or 3. I am not a big fan of the capital letter component names, so i have a slight preference for option 3: net.add_generators (but i would use the plural). The drawback is the custom component system, which allows you to add new component types (not sure, if we autogenerated those methods, jedi or pylance would be able to show their doc strings).

Thanks for the proposal

coroa avatar Jun 10 '24 15:06 coroa

If we are talking about api change, i also would like to try to gather everything up into objects eventually (ie. a Generators object, which holds the dataframes for static and dynamic attributes as well as adders).

net.generators -> net.generators.static (or df) net.generators_t -> net.generators.varying (or pnl) net.add("Generator", ...) -> net.generators.add(...)

But only to let it float once. Don't try to go there directly.

coroa avatar Jun 10 '24 16:06 coroa

EDIT: It might be that a ComponentDataFrame could be a strategy (like the GeoDataFrame), but really not sure how much work this would require

FabianHofmann avatar Jun 10 '24 16:06 FabianHofmann

Following up on this: I edited plural/ capital form for the options and summarized pros and cons with a table. So, see again the first post.

Yes, I mean discoverability with what I called component hints. There are actually three levels of discoverability: components, component attributes and component attribute default values. With the n.add() right now, we are lacking all of them. All three options allow for component and attribute discoverability. Option 1 also allows for default values, in a way.

With option 1 there are basically a couple of options on how to add components:

# Old way, no deprecation needed
n.add('generators', names='gen', p_set=100)

# Mix, no attr discoverability
from pypsa.components import Generators
n.add(Generators(), names='gen', p_set=100)
# We could discuss to if passing the instance or class is the better approach here. Maybe instance if now further attributes are assigned in n.add, and class if there are assigned in there.

# Full discoverability
generators = Generators(names='gen', p_set=100)
n.add(generators)
# To get default value
print(generators.p_nom)

This is only the add discussion. The second discussion is on how to construct component classes. With option 1 this would beautifully align with those API changes:

net.generators -> net.generators.static (or df) net.generators_t -> net.generators.varying (or pnl) net.add("Generator", ...) -> net.generators.add(...)

The issue with the GeoDataFrame similar approach is that we have too many dimensions for that: list of components, attributes, and optional snapshots. GeoDataFrames don't have that issue. And if we mix them in one data frame, we lose all the advantages of this in the first place (e.g. sticking to the fact that net.generators is a data frame, like right now, and not a custom class). So I think a custom class with the API shown above and the adding system like above would be the best way.

But this is a second discussion and does not need to be decided right now. Now it would be great to agree on one of the options to get this PR merged.

lkstrp avatar Jun 21 '24 12:06 lkstrp

@lkstrp I really like the approach in here:

# Full discoverability
generators = Generators(names='gen', p_set=100)
n.add(generators)

It paves the way to the component class that @coroa mentioned. I am just wondering whether defaults should be stored in an extra sub-field, like

generators.defaults.p_nom 

such that

generators.p_nom

returns the actual assigned value.

FabianHofmann avatar Jun 23 '24 13:06 FabianHofmann

Regarding defaults, isn't it simpler and less overhead to just let generators.p_nom be the assigned value if something was assigned and the default value if nothing was assigned. Default values after something has been overassigned could still be checked by initiating a new empty class.

I move this discussion to a new issue #936

lkstrp avatar Jun 24 '24 06:06 lkstrp

Hej, joining the discussion once more, i am not yet sure this is going fully in the correct direction.

Ok, here we define component classes and their init function allows full discoverability for the type of components that exist (which is classes in pypsa.components, and potentially external pypsa.contrib.components at some point in the far future), for the attributes which exist and also their defaults. Looks great so far.

What i am unsure about is the interface and data storage of the instances. For example: generators = Generators(names=["a", "b"], p_nom=50, p_max_pu=0.6)

Leading questions:

  1. Should Generators become the new full data container for what we have now at net.generators and net.generators_t? (I don't think it can, because it at least currently would have to know about net.snapshots)
  2. If no (to question 1), then we might end up with two different classes called Generators eventually. The generators instance from above, and what is going to be net.generators after net.add(generators). Is that a good idea?
  3. If yes (to question 1): then you have to deal with the ambiguity whether an attribute look up, generators.p_max_pu, is referring to a varying attribute, a static one or a default one (agreed that whether default or non-default, one could just return the actual value, so the latter one i is not soo important).
  4. If yes (to question 1): There is a fundamental different between an input attribute like p_nom, p_max_pu or an output attribute like p_nom_opt or p. Should we separate those out storage and interface-wise?

coroa avatar Jun 27 '24 09:06 coroa

Ah, i did not scroll up to see that you updated your description. I don't understand how

net.add(Generators(...))

would give you any insight into discoverability of default values, which

net.add_generators(...)

could not give you?

I imagine both would have custom doc-strings and default values in their method definition (__init__ vs add_generators).

What am i missing?

coroa avatar Jun 27 '24 09:06 coroa

I imagine both would have custom doc-strings and default values in their method definition (__init__ vs add_generators).

Yes, and these default values are not hard-coded into the docstring but loaded from the csv, so they can't be discovered via it. We could think about dynamically updating __doc__ after instantiation, but not sure if this is a good idea. I can have a look.

Which means that n.add(Generators()) would allow no more discoverability than n.add_generators(). Only via the intermediate step generators = Generators() you could have a look at your defaults before adding them to the network:

generators = Generators(names='gen', p_set=100)
# Check your defaults
print(generators.p_nom)
# Add to network
n.add(generators)

But that leads to your questions, which raise a very good point.

1 Should Generators become the new full data container for what we have now at net.generators and net.generators_t? (I don't think it can, because it at least currently would have to know about net.snapshots)

For n.snapshots it can't know. Not sure if other attributes might be a problem? Linkports?

2 If no (to question 1), then we might end up with two different classes called Generators eventually. The generators instance from above, and what is going to be net.generators after net.add(generators). Is that a good idea?

These do not have to be different classes, but two different instances. Assuming only n.snapshots is the problem at this stage, one way to keep them equal, is to just pass the snapshots. Either as an indexed Series/ DataFrame or via an extra snapshots argument. Right now there are also no networks with varying attributes without snapshots being defined.

3 If yes (to question 1): then you have to deal with the ambiguity whether an attribute look up, generators.p_max_pu, is referring to a varying attribute, a static one or a default one (agreed that whether default or non-default, one could just return the actual value, so the latter one i is not soo important).

This ambiguity you have right now as well. We just change the mapping. n.generators.p_max_pu -> n.generators.static.p_max_pu n.generators_t.p_max_pu -> n.generators.varying.p_max_pu Right now you could have in the static one just 1s for each component and in the varying one a series at the same time. If this is the right approach is another discussion.

To throw in some more ideas, it's even a question of whether we need the differentiation between varying and static at all. All attributes that can be both could simply either be indexed by n.snapshots or have a dimension. In Numpy this would vectorize beautifully. Meaning [1,2] * 2 == [1,2] * [2,2]. In Pandas we would have to trade the index somehow. But this is even more far away I guess.

4 If yes (to question 1): There is a fundamental different between an input attribute like p_nom, p_max_pu or an output attribute like p_nom_opt or p. Should we separate those out storage and interface-wise?

Storage wise I would make it performance-dependent. Otherwise we could only set filters based on the core data passively. API wise something like following would then also be more consistent:

n.generators.in.static.p_max_pu
n.generators.out.static.p_nom_opt

All in all, this is a discussion that I have tried to move to #936. This PR just deprecates n.madd().

lkstrp avatar Jun 27 '24 13:06 lkstrp

Codecov Report

Attention: Patch coverage is 69.44444% with 33 lines in your changes missing coverage. Please review.

Project coverage is 78.28%. Comparing base (b8fbebf) to head (7ca92c9). Report is 23 commits behind head on master.

Files Patch % Lines
pypsa/components.py 67.53% 19 Missing and 6 partials :warning:
pypsa/clustering/spatial.py 50.00% 6 Missing :warning:
pypsa/io.py 89.47% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #896      +/-   ##
==========================================
- Coverage   78.32%   78.28%   -0.05%     
==========================================
  Files          22       22              
  Lines        4785     4799      +14     
  Branches     1015     1019       +4     
==========================================
+ Hits         3748     3757       +9     
- Misses        770      775       +5     
  Partials      267      267              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jul 10 '24 15:07 codecov[bot]

True, right now, especially when overwriting certain components, it can be inconsistent.

What do you have in mind? Just hard-coded alphabetically? Or in order of when they were added? The first could be added here. The second is a bit trickier because of the two Import from Series and Import from Dataframe functions. I'd like to get rid of them at some point anyway, so it would make sense to add a consistent order then.

lkstrp avatar Sep 06 '24 11:09 lkstrp

Either alphabetically or in the order of the components_attrs/buses.csv etc. However, the latter does not include additional columns like bus2, bus3 for links for example, which would go then to the end. So I would say alphabetical it is.

FabianHofmann avatar Sep 06 '24 12:09 FabianHofmann

Either alphabetically or in the order of the components_attrs/buses.csv etc. However, the latter does not include additional columns like bus2, bus3 for links for example, which would go then to the end. So I would say alphabetical it is.

Based on the attributes order in components_attrs/buses.csv has a better usability, doesn't it? It is also less of a change to the current state. I just pushed a changed for that, it is not much.

I was also talking about the order of components (component names). Here alphabetical would make sense, but this will need some test rewrites as well. Or do you want to keep it as it is by last added? This just leads to issues if you overwrite components, because they are added again at the end (could also be fixed though).

@FabianHofmann @fneum So I would like to get a confirmation on:

  • component attrs: sorted as in components_attrs, other attrs at the end
  • component names: alphabetical

lkstrp avatar Sep 09 '24 14:09 lkstrp

@lkstrp I like the ordering according to component_attrs it gives you more control (and we can think about better orders in the future). The only flaw it has is the bus2, bus3 columns. We should have them much more at the beginning, right after bus1. however we could solves this. there, is a special function for adding these on the fly update_linkports_component_attrs introduced in https://github.com/PyPSA/PyPSA/pull/669/files. when we adapt this function to meet sensible bus columns ordering then all is good.

FabianHofmann avatar Sep 09 '24 16:09 FabianHofmann

@lkstrp I like the ordering according to component_attrs it gives you more control (and we can think about better orders in the future). The only flaw it has is the bus2, bus3 columns. We should have them much more at the beginning, right after bus1. however we could solves this. there, is a special function for adding these on the fly update_linkports_component_attrs introduced in https://github.com/PyPSA/PyPSA/pull/669/files. when we adapt this function to meet sensible bus columns ordering then all is good.

and the efficiencies, i.e. efficiency2, efficiency3

FabianHofmann avatar Sep 09 '24 16:09 FabianHofmann

Ok!

fneum avatar Sep 09 '24 17:09 fneum