klayout icon indicating copy to clipboard operation
klayout copied to clipboard

WIP: Added python stubs with type hinting and documentation.

Open thomaslima opened this issue 2 years ago • 12 comments

This is an attempt to help address #1092 and improve the API usability. These changes enable static type checking and hinting while using an IDE like VSCode with Pylance.

https://user-images.githubusercontent.com/1088659/180698692-dbfff118-55bb-479a-a036-d02a9c977746.mov

Details:

The documentation was extracted by inspecting the docstrings within each class and methods. This should enable type hinting and checking by IDEs like VSCode. The stubs were automatically generated, and have not been manually curated.

NOTE: This is a work in progress. I recommend manually checking nothing is broken and that no tests broke down.

See dev notes.

TODO:

  • [x] 1. Complete signature generation for all methods in pyaModule.cc. A line containing func(arg1, arg2) should suffice. Multiple lines automatically generate overloading.
  • [x] 2. Patch stubgen so that it can create docstrings. (Or wait for issue 11965 to be resolved)
  • [x] 3. Generate stubs for klayout.*core.
  • [ ] 4. Manually check and backannotate types in klayout.*core.pyi.
  • [ ] 5. How to update the stubs if klayout code changes? Manually for now.

thomaslima avatar Jul 25 '22 04:07 thomaslima

This looks great, thank you Thomas!

joamatab avatar Jul 25 '22 20:07 joamatab

Many thanks for this PR!

Very good you noticed and fixed these default value issues.

About the automatic update I can add a qmake target. I'd prefer the (patched) stubgen script to be part of KLayout's scripts, so it does not need to be installed separately. However, the license of mypy is MIT which not compatible with GPL, so I guess there is a problem with that.

But there is another solution:

Actually, KLayout Python objects support reflection through a private (i.e. not publicly documented) interface in the "tl" module for pymod or "pya" module for the application.

For example, this script will print all classes and methods that are registered:

# Use "tl" for pymod instead of "pya"
import pya

for c in pya.Class.each_class():
  base = "object"
  if c.base():
    base = c.base().name()
  print(c.module() + "." + c.name() + " (" + base + "):")
  for m in c.each_method():
    print("  " + m.name())

The private classes involved are:

  • Class: represents one registered class
  • Method: represents a method
  • MethodOverload represents a method variant
  • ArgType represents a method argument

You won't find these classes in the public documentation, but you'll see them using "help" (e.g. help(pya.Class)).

The information provided by these objects should be good enough to emulate stubgen. I feel this is fairly simple and finally implements a straightforward data path instead of parsing doc strings. I think this could be easily added to the build script as a step between compiling the C code and collecting the files for the pymod installation.

I suggest you'll proceed for now with the stubgen approach and I can assist later in generating compatible information from the built-in reflection API.

Best regards,

Matthias

klayoutmatthias avatar Jul 25 '22 21:07 klayoutmatthias

Thanks Matthias. I did not know about this functionality of the tl module. Live and learn, this software is so huge! There were a few more issues regarding default args I had identified but I followed the principle of minimal change.

Before I go further, I don't see the use of mypy is incompatible with klayout's GPL. I think it is compatible. But even if it weren't, we only need the output of the script, namely the .pyi stub files (see the new files in the commit). So I think it's perfectly reasonable to install mypy only during the automation that creates the wheels. mypy shouldn't become a dependency in my opinion.

Secondly, I'm not sure we want to automate the generation of the stub files. I think that they are not 100% correct and can be improved upon manually, though I haven't gone through them in detail yet. For example, we can manually annotate the db.Trans.M0 etc. as an enumerate instead of a generic Any. It would be nice to capture updates to the docstrings though.

Here's what I was able to draft from your example:

import pya  # initialize all modules
import klayout.tl as ktl

for c in ktl.Class.each_class():
  base = "object"
  if c.base():
    base = c.base().name()
  if 'Path' not in c.name():
    continue
  print(c.module() + "." + c.name() + " (" + base + "):")
  for m in c.each_method():
    args = ["self"] + [f"{a.name()}: {a.to_s()}" for a in m.each_argument()]
    print("  " + m.name() + "(" + ", ".join(args) +") -> " + m.ret_type().to_s())

This is the result:

db.Path (object):
  new|#from_dpath(self, dpath: const DPath &) -> new Path *
  to_dtype(self, dbu: double) -> DPath
  transformed(self, t: const ICplxTrans &) -> Path
  round_corners(self, radius: double, npoints: int) -> Path
  new(self) -> new Path *
  new|#new_pw(self, pts: const Point[] &, width: int) -> new Path *
  new|#new_pwx(self, pts: const Point[] &, width: int, bgn_ext: int, end_ext: int) -> new Path *
  new|#new_pwxr(self, pts: const Point[] &, width: int, bgn_ext: int, end_ext: int, round: bool) -> new Path *
  <(self, p: const Path &) -> bool
  \=\=(self, p: const Path &) -> bool
  !\=(self, p: const Path &) -> bool
  hash(self) -> unsigned long
  points=(self, p: const Point[] &) -> void
  each_point(self) -> Point
  num_points|#points(self) -> unsigned long
  width=(self, w: int) -> void
  width(self) -> int
  bgn_ext=(self, ext: int) -> void
  bgn_ext(self) -> int
  end_ext=(self, ext: int) -> void
  end_ext(self) -> int
  round=(self, round_ends_flag: bool) -> void
  is_round?(self) -> bool
  \*(self, f: double) -> Path
  move(self, p: const Vector &) -> Path &
  move(self, dx: int, dy: int) -> Path &
  moved(self, p: const Vector &) -> Path
  moved(self, dx: int, dy: int) -> Path
  transformed(self, t: const Trans &) -> Path
  transformed|#transformed_cplx(self, t: const CplxTrans &) -> DPath
  from_s(self, s: string) -> new Path *
  to_s(self) -> string
  simple_polygon(self) -> SimplePolygon
  polygon(self) -> Polygon
  perimeter(self) -> unsigned long long
  area(self) -> long long
  length(self) -> unsigned int
  bbox(self) -> Box
  _unmanage(self) -> void
  _manage(self) -> void
  _create|#create(self) -> void
  _destroy|#destroy(self) -> void
  dup(self) -> new Path *
  assign(self, other: const Path &) -> void
  _destroyed?|#destroyed?(self) -> bool
  _is_const_object?|#is_const_object?(self) -> bool
db.DPath (object):
  new|#from_ipath(self, path: const Path &) -> new DPath *
  to_itype(self, dbu: double) -> Path
  round_corners(self, radius: double, npoints: int, accuracy: double) -> DPath
  transformed(self, t: const VCplxTrans &) -> Path
  new(self) -> new DPath *
  new|#new_pw(self, pts: const DPoint[] &, width: double) -> new DPath *
  new|#new_pwx(self, pts: const DPoint[] &, width: double, bgn_ext: double, end_ext: double) -> new DPath *
  new|#new_pwxr(self, pts: const DPoint[] &, width: double, bgn_ext: double, end_ext: double, round: bool) -> new DPath *
  <(self, p: const DPath &) -> bool
  \=\=(self, p: const DPath &) -> bool
  !\=(self, p: const DPath &) -> bool
  hash(self) -> unsigned long
  points=(self, p: const DPoint[] &) -> void
  each_point(self) -> DPoint
  num_points|#points(self) -> unsigned long
  width=(self, w: double) -> void
  width(self) -> double
  bgn_ext=(self, ext: double) -> void
  bgn_ext(self) -> double
  end_ext=(self, ext: double) -> void
  end_ext(self) -> double
  round=(self, round_ends_flag: bool) -> void
  is_round?(self) -> bool
  \*(self, f: double) -> DPath
  move(self, p: const DVector &) -> DPath &
  move(self, dx: double, dy: double) -> DPath &
  moved(self, p: const DVector &) -> DPath
  moved(self, dx: double, dy: double) -> DPath
  transformed(self, t: const DTrans &) -> DPath
  transformed|#transformed_cplx(self, t: const DCplxTrans &) -> DPath
  from_s(self, s: string) -> new DPath *
  to_s(self) -> string
  simple_polygon(self) -> DSimplePolygon
  polygon(self) -> DPolygon
  perimeter(self) -> double
  area(self) -> double
  length(self) -> double
  bbox(self) -> DBox
  _unmanage(self) -> void
  _manage(self) -> void
  _create|#create(self) -> void
  _destroy|#destroy(self) -> void
  dup(self) -> new DPath *
  assign(self, other: const DPath &) -> void
  _destroyed?|#destroyed?(self) -> bool
  _is_const_object?|#is_const_object?(self) -> bool

Which is pretty nice. This method has the advantage of being completely agnostic to the docstring, it seems. Translating the argument types should be straightforward. One issue is that it fails to capture all the logic inside make_classes which renames some method names and also deals with synonyms. So we'd have to duplicate the logic outside in python as well. What do you think?

thomaslima avatar Jul 25 '22 22:07 thomaslima

Hi Thomas,

Very good, that is encouragingly simple.

Regarding the synonyms - I mainly use that to deprecate methods and to correct typos. Old versions still exist, but the are not shown in the help texts. For generation of the stubs I'd propose to simply iterate over the synonyms and produce a stub for each.

I have some more remarks or questions:

  • Do the stubs need to know if a method is a property setter or getter? In this case there are flags which can be evaluated
  • Some name translation happens during generation of Python bindings. For example "+" becomes "add" and "exec" becomes "exec_" (reserved word). "property=" becomes "set_property".

For details about name translation you can check this file: https://github.com/KLayout/klayout/blob/master/src/pya/pya/pyaModule.cc

Line 470++: list of reserved words (an underscore is appended) Line 511++: translation of names

Best regards,

Matthias

klayoutmatthias avatar Jul 28 '22 19:07 klayoutmatthias

Hi Matthias,

I am aware of the pyaModule.cc, but are you comfortable duplicating that code outside as well? It would duplicate the maintaining. Another idea: what if we create the stub file from pyaModule itself instead of using the TL module? Then it would share the same methods.

Regarding your remarks:

  1. This is our choice I think. I can say what mypy does. If a property name is readonly, it annotates like so:
@property
f'def {name}(self) -> {type}: ...'

If it is read and write, then it annotates as a normal variable"

f'{name}: {type}'

I noticed that the community doesn't like variables that are 'write-only'. Are there variables like that? If there are, I just propose to pretend it is a RW variable for type hinting purposes. BTW, if it's a class variable, we need to do

f'{name}: ClassVar[{inferred}] = ...'
  1. That's fine. So long as the stub matches the pyaModule's translation. The good news is that there is a way to check the validity of stubs with mypy's stubtest, though I haven't tested yet.

LG Thomas

thomaslima avatar Jul 28 '22 20:07 thomaslima

Hi Thomas,

Regarding write-only properties I am trying to avoid them as well, but maybe they slipped in. If they did, I'd suggest to turn them into methods like "set_property" and not make them "@property".

About code duplication: that is usually not a good idea but here it is about language specific details (which I guess won't change that quickly :) ) and sharing code across Python and C++ is notoriously difficult, so think accepting the duplication is just pragmatic in this case.

Best regards,

Matthias

klayoutmatthias avatar Jul 28 '22 20:07 klayoutmatthias

Hi Matthias,

I made some progress in translating the types (See commit). I have a question regarding properties. How do I use the tl.Class mechanism to figure out whether a method is a property or not? And to tell whether it is a setter/getter? Do we just go by name? I wasn't able to figure that out.

Also, what is a method variant? How is it different than a synonym?

Thanks, Thomas

thomaslima avatar Jul 30 '22 04:07 thomaslima

Hi Thomas,

this needs a little more explanation, I see.

First of all, I should say that I'd only provide stubs for the "primary method". Among the synonyms (MethodSynonym objects delivered by "each_overload") there is one matching the "primary_name" of the Method object. I'd ignore the other ones as they are usually deprecated versions.

Furthermore, I'd ignore signals as they do not have arguments and are neither setters nor getters.

Regarding the getter detection: there is a "is_getter" attribute in MethodSynonym, but that is rather a hint. KLayout's Python binding will generate getters also if there is a setter with the same name, the method accepts has no arguments and itself is not the setter. This boils down do an extended definition of "is_getter" derived like this:

def primary_synonym(m):
  for ms in m.each_overload():
    if ms.name() == m.primary_name():
      return ms
  raise("Primary synonym not found for method " + m.name())
  
... c is a class ...
  
  setters = set()
  
  for m in c.each_method():
  
    if m.is_signal():
      continue
  
    method_def = primary_synonym(m)
    if method_def.is_setter():
      setters.add(method_def.name())
    
  for m in c.each_method():

    num_args = len([ True for a in m.each_argument() ])
    method_def = primary_synonym(m)
    
    # extended definition of "getter" for Python
    is_getter = num_args == 0 and (method_def.is_getter() or (not method_def.is_setter() and method_def.name() in setters))
    
    print("  " + method_def.name() + " setter=" + str(method_def.is_setter()) + " getter=" + str(is_getter))

Regarding the variants: many methods exist with the same name, but different argument types.

How do the stubs handle that? Can there be multiple stubs with the same name or is this kind of polymorphism not allowed in mypy in general?

Best regards,

Matthias

klayoutmatthias avatar Aug 01 '22 21:08 klayoutmatthias

I haven't had a chance to test your suggestion, but I'd like to answer your question. If you analyze the dbcore.pyi file from the PR, you'll see loads of "overloads", or as you call, variants. For example:

class Box:
    ...
    @overload
    def transformed(self, t: ICplxTrans) -> Box:
        """
        transformed(t: ICplxTrans) -> Box
        @brief Transforms the box with the given complex transformation


        @param t The magnifying transformation to apply
        @return The transformed box (in this case an integer coordinate box)

        This method has been introduced in version 0.18.
        """
    @overload
    def transformed(self, t: Trans) -> Box:
        """
        transformed(t: Trans) -> Box
        @brief Returns the box transformed with the given simple transformation


        @param t The transformation to apply
        @return The transformed box
        """
    @overload
    def transformed(self, t: CplxTrans) -> DBox:
        """
        transformed(t: CplxTrans) -> DBox
        @brief Returns the box transformed with the given complex transformation


        @param t The magnifying transformation to apply
        @return The transformed box (a DBox now)
        """

thomaslima avatar Aug 07 '22 19:08 thomaslima

Hi @klayoutmatthias ,

I have almost completed the new stubgen program based on your suggestion (check the commit above). It got a bit out of hand because I needed to inspect class variables inside other classes. For example: CompoundRegionOperationNode.LogicalOp.

If you want to run it, run stubgen.py inside src/pymod.

There is a bug I'd like to ask you about. In the script you sent, the method_def.name() in the case of the *= operator in Point/DPoint/Vector/DVector is actually returning =. If you actually do m.name(), I believe it returns *\\=. I am assuming this is unintended behavior, but I didn't know how to fix it. Can you take a look?

Best, Thomas

thomaslima avatar Aug 09 '22 21:08 thomaslima

Thanks a lot and also thanks informing me about the "*=" problem.

That is a kind of bug - * in the method's annotated m.name() means "protected", so *= was read as "protected =". Hence the backslashes and the * being dropped.

I am currently preparing 0.27.11 and will include a small patch that fixes this problem.

I forgot about the child class traversal, I'm afraid. If you look closely at the implementation, you'll see that child classes are not only made available inside their parent class (e.g. "CompoundRegionOperationNode.LogicalOp"), but also on the top level as "CompoundRegionOperationNode_LogicalOp" (note the underscore). Please ignore this second version. It's a legacy concept from times when I did no understand how to generate child classes in Ruby or Python. I want to drop that in the future.

I'm also sorry about the messy code for the getter detection. That stems from times when Ruby was the only language and inside Ruby, getters are simply methods without arguments. So no need to identify getters. When I implemented Python, I should have first polished the is_getter attribute instead of hacking around with the information I had at hand. I want to change this in a way that is_getter reliably tells you the truth from the Python perspective. But that's a too big change to go into a stable 0.27 branch.

Best regards,

Matthias

klayoutmatthias avatar Aug 10 '22 19:08 klayoutmatthias

No worries! Thanks for clarifying. I did not find a way to iterate child classes in the tl.Class object. So I decided to collect a set of "unknown" classes that appeared while iterating through method arguments or returns. This was my workaround. Having an Class.each_class iterator would be the ideal solution.

Please let me know when you create a patch, so I can apply here and close the PR.

thomaslima avatar Aug 10 '22 19:08 thomaslima

Hi @klayoutmatthias

Thanks for the patch on the *= method. I ended up implementing two extra methods in tl.Class: each_child_class and parent, which then helped me to complete the stubgen.py script. See results in dbcore.pyi tlcore.pyi and rdbcore.pyi. I generated them based on version 0.27.11, but have not integrated with CI. I think they should be computed semi-manually for every release. Check Typehint Stubs Notes for some notes.

Note: you'll note that I took the liberty to change db.GenericDeviceExtractor.name to db.GenericDeviceExtractor.set_name, because the base class already had a name method (which is incompatible with a property), creating a bug in type checking.

I think it's ready for merging. It will be a great addition to 0.27.12.

Best, Thomas

thomaslima avatar Aug 20 '22 05:08 thomaslima

Hi Thomas,

thanks a lot for this patch and the substantial bug fixes, specifically the embarrassing typos :( Doesn't shed a good light on test coverage :( I really appreciate your efforts!

Regarding the GenericDeviceExtractor: I'd prefer having a read/write attribute "name". Reason is that I already published code using "name=". I think it's only missing the getter, so let me add it and use "name=" again.

There is a small concern on mine regarding 0.27: usually I try not to touch much as my promise is to keep this branch stable. I think your code does not impose any risk here, but I'd like to give it a small review before I merge. Specifically the set_name on GenericDeviceExtractor I'd like to fix.

Best regards and many thanks,

Matthias

klayoutmatthias avatar Aug 22 '22 07:08 klayoutmatthias

Hi Matthias,

I have cleaned up (reverted) the code in pyaModule.cc that was left unused during refactoring to make your review easier.

thanks a lot for this patch and the substantial bug fixes, specifically the embarrassing typos :( Doesn't shed a good light on test coverage :( I really appreciate your efforts!

Very few typos, in my opinion. Just a few inconsistencies that are only relevant for the python (maybe ruby?) GSI.

Regarding the GenericDeviceExtractor: I'd prefer having a read/write attribute "name". Reason is that I already published code using "name=". I think it's only missing the getter, so let me add it and use "name=" again.

I agree. But if you do that, then I think it's best to change base class's DeviceExtractorBase.name to a property.

There is a small concern on mine regarding 0.27: usually I try not to touch much as my promise is to keep this branch stable. I think your code does not impose any risk here, but I'd like to give it a small review before I merge. Specifically the set_name on GenericDeviceExtractor I'd like to fix.

Of course. I figured 0.27 is your stable branch and master is the unstable one which doesn't seem connected to 0.27. We don't seem to have a stable development branch.

Feel free to review and let me know if you want me to squash all commits to a single commit message so that we can reduce pollution in git history.

thomaslima avatar Aug 22 '22 14:08 thomaslima

Hi Thomas,

For GenericDeviceExtractor the problem was simply that "name=" was in the wrong class. I moved it and now it should be consistent. I have placed one commit into your branch - I don't know how to submit a PR on a PR, so I took the liberty to do so :)

Apart from that I do not have any issues merging the branch. I'd like to do a squash commit so it's easier to cherry-pick for master in a single commit. Green light from your side?

Thanks,

Matthias

klayoutmatthias avatar Aug 22 '22 20:08 klayoutmatthias

Hi Matthias,

For GenericDeviceExtractor the problem was simply that "name=" was in the wrong class. I moved it and now it should be consistent. I have placed one commit into your branch - I don't know how to submit a PR on a PR, so I took the liberty to do so :)

This was the intention! Thanks for letting me have branches in the main repo.

Apart from that I do not have any issues merging the branch. I'd like to do a squash commit so it's easier to cherry-pick for master in a single commit. Green light from your side?

I just tested your changes and recompiled the .pyi files. OK to merge. I recommend using the "Squash and merge" button so that I'm counted as a contributor.

image

Best, Thomas

thomaslima avatar Aug 22 '22 22:08 thomaslima

I just merged. Sorry for the delay - I was busy last week.

I'll try to merge into master and close the ticket then.

Again, thanks for your efforts!

Matthias

klayoutmatthias avatar Aug 27 '22 10:08 klayoutmatthias

Thanks for merging! I believe this closes #1092 .

thomaslima avatar Aug 29 '22 13:08 thomaslima