idaes-pse
idaes-pse copied to clipboard
[Discussion] Information about naming conventions for classes affected by pylint "no-self-argument" errors
This discussion/question is part of the ongoing effort described in IDAES/idaes-dev-issues#23 to eliminate pylint false-positive errors in the current codebase.
Why is pylint complaining?
Similarly to what described in IDAES/idaes-dev#1220, pylint considers it a no-self-argument
error whenever the first argument in a method definition is not named self
. This in turn causes a large amount of false-positive errors being reported by pylint for many classes that function perfectly well at runtime.
In addition to the affected classes (and proposed fixes) described in IDAES/idaes-dev#1220, the main category of class where these false-positive errors occur is in Block
/BlockData
subclasses.
Taking as an example the definition of SaponificationStateBlockData
in idaes.generic_models.properties.examples.saponification_thermo
(link), we see that self
, blk
, and b
are all used as first argument for different methods within the same class.
Possible solutions
In this case, unlike in IDAES/idaes-dev#1220, I'd be more inclined to address these issues at the pylint/syntax level than by making changes to the code. The "blunt scalpel" option of disabling the no-self-argument
globally in pylint is always on the table, but more fine-grained exclusions would be preferable (if feasible/applicable), since they would allow pylint to be used as a (machine-enforceable) way to make sure that existing practices are respected.
My questions
To be able to suggest an appropriate strategy to address these pylint issues, I wanted to ask a few general questions to get a better sense of what are the requirements, conventions, and design considerations involved:
- What are the naming conventions for the first argument of a method defined in one of the affected classes?
- Is there a difference (based on their first argument's name) in how the methods are called at runtime?
- Is there a semantic criteria why the first argument is named
self
,blk
, orb
? - Are
blk
andb
the only non-self
names used? - Are the "non-
self
methods" (method definitions where the first argument is named something other thanself
) specific to a specific kind of classes (e.g. only forProcessBlock
/ProcessBlockData
, but not other kinds ofBlock
/BlockData
classes?)
Essentially, I'd be interested to know more about existing practices or logic in the codebase in this regard. Based on the git blame
s I'm paging @eslickj and @andrewlee94 but of course everyone's thoughts on this are very welcome.
Probably the short answer here for why we did this is a) I was learning Python as I went and b) to distinguish scope in sub models. E.g.
def build(self):
self.var = Var()
def a_rule(b):
# I use b here to keep it separate from self, even though they are actually the same object.
return b.var == 7
self.constraint = Constraint(rule=a_rule)
Some of this is also likely inherited from how Pyomo writes their examples, as that was what I copied from when getting started.
- There are no codified naming conventions for those, beyond the fact that a large number were written by me and tend to follow the same pattern.
- The difference in name is more based on concrete versus abstract, see 3.
- The convention as much as it is:
self
is used when there is a fairly concrete object involved - i.e. things like the build method.blk
tends to be used in more general methods that are intended to be applied to multiple objects - e.g.initialize
methods which get called later and can be applied to any instance of a given model.b
tends to be used within the sub-methods of model construction when we are building a model (the rules that make up the expressions and constraints) and is used to distinguish the scope of the sub-method from the main method (even though functionally they are identical). - Occasionally you might see
m
. For note,blk
andb
indicate "block" whilstm
indicates "model". - I think you will find
non-self
methods in a lot of places, such as the supporting utility methods as well (as they fall into the general use category). However, things are probably even less standardised there.
Thank you so much @andrewlee94 for the detailed answer--that's exactly the type of info I was looking for.
Based on this, my plan would be to try to add blk
, b
and m
to the list of allowed names for a method's first argument, and, if this proves to be too involved to implement, fall back to the blanked disabling of no-self-argument
altogether (since, arguably, it belongs to the orange area between "error" and "warning" anyway).
That said, just for the sake of knowing: if there is (or has ever been at some point) an interest in uniformizing these methods to use self
instead, I'm happy to try and see if it works, since as I understand the self
/blk
/b
/m
difference is semantic rather than structural (i.e. the different names are not used in special ways at runtime). Otherwise, if this is not the case, I'll go ahead with the pylint solution.
From the coding perspective, I don't think there is nay problem with changing everything to self
. However, given that our target audience is chemical and process engineers, I wonder if the semantic difference is helpful to them for conceptualizing things.
However, we should at standardize the naming to some extent - there i no need to have 4 different options - I think we only really need 2 at most (self
and one other for the sub-methods that we use for constructing constraints and expressions).
I agree that the domain-specific knowledge aspect is extremely important (and, at least IMO, should take precedence over general software engineering conventions, especially as you say considering the target audience).
If I understand your second point correctly, one possible solution could be to use self
consistently for the "structural" method (i.e. the function definition directly within the class definition scope, which is what pylint complains about), and (one or more) of the semantically relevant names for the "submethods" (i.e. the functions defined within the "structural" method's own scope):
class MyBlock(Block):
def do_something_blocklike(self):
"This is a structural method"
def increase_pressure(b):
"This is a submethod"
b.pressure *= 2
self.add_block_rule(increase_pressure)
def do_something_modellike(self):
def simplify_model(m):
m.constraints.drop_redundant()
self.add_model_rule(simplify_model)
(apologies to all domain experts for the nonsense pseudocode)
Would something like this go in the direction you were thinking of?
@lbianchi-lbl Yes, that is along the lines of what I was thinking, but unifying to have only one accepted "semantically relvaent" name for the sub-methods (so either m
or b
but not both).
One idea is to use type annotations, which have no effect on runtime behavior, to indicate the chemical-engineering relevant metadata, e.g. def a_method(self: block)
. The "block" is just a string, and could be anything.
I think the origin of b rather than self as the first argument where pylint is probably complaining, is that we didn't used to inherit Pyomo blocks. We just to have functions that would take a block as an argument and build, initialize, or whatever a model in a given block. When we changed style and those function became methods of a block class, we didn't necessarily update the b's to self. Then that got copied by people who didn't know better or weren't really motivated to update the naming. I'm fairly sure in class methods we should always use self rather than b or blk or m. In Pyomo rules (submethods) b or blk or m is probably still fine, I don't see why pylint would complain.
@dangunter: personally I like your idea, I think it's a nice use of type annotations.
Not having much experience with type annotations yet, I'm not sure off the top of my head what are the exact requirements in terms of the annotation being a valid Python object, i.e. if def a_method(self: block)
would be OK, or if it should be def a_method(self: "block")
(as in, either a string literal, or a previously defined variable).
Somewhat related to that, I'm also wondering how difficult it would be to go the extra mile and try using semantically relevant actual types (e.g. def a_method(self: Block)
or def a_method(self: Model)
). But, this could very well end up being even more confusing or not feasible at all (depending on whether semantically relevant actual types exist at all, or whether there is a good match between their interface and the context in which these methods are used).
@andrewlee94 and @eslickj: this makes sense, thanks for the clarifications. I can confirm that pylint will be perfectly fine with us using anything we want for the Pyomo rules/submethods, as long as self
is used for the (structural/outermost) methods defined in the class scope.
FWIW, this syntax (self
in outer method, b
/blk
for inner function) is the one that would feel the most "natural" or "idiomatic" to me, as a domain-ignorant Python programmer, in the context of this intended use.
A final decision on this will happen after pr #58 is merged.
This could be done as part of the IDAES v2 codebase refactoring/improvements.
@lbianchi-lbl I agree - we should take this opportunity to fix the codebase to resolve this. However, we would need to discuss what is required (i.e. where we should use self
, and where we shouldn't).
@andrewlee94, I think the answer is that we should use self everywhere pylint complains (first argument to methods in a class where self is an object).