WIP Type Hinting of novelwriter.core
Summary: A Work in Progress branch to add type hints to the core package which are backwards compatible to python 3.7
- additional requirements for type-hinting included in requirements.txt and requirements-dev.txt
NOTE: The type hint style used follows python 3.10 standards however between the use of
from __future__ import annotations
and the typeing_extensions back-ports the code works on 3.7+
I have been using pyright and mypy from a python 3.7 and 3.10 venv to verify
Files that have type hints completed
- [x]
novelwriter/common.pycommon functions - [ ]
novelwriter/core/__init__.py - [ ]
novelwriter/core/document.py - [ ]
novelwriter/core/index.py - [x]
novelwriter/core/item.py - [ ]
novelwriter/core/options.py - [x]
novelwriter/core/project.py - [ ]
novelwriter/core/spellcheck.py - [x]
novelwriter/core/status.py - [ ]
novelwriter/core/tokenizer.py - [ ]
novelwriter/core/toodt.p - [ ]
novelwriter/core/tohtml.py - [ ]
novelwriter/core/tomd.py - [x]
novelwriter/core/tree.py
Related Issue(s):
Reviewer's Checklist:
- [ ] The header of all files contain a reference to the repository license
- [ ] The overall test coverage is increased or remains the same as before
- [ ] All tests are passing
- [ ] All flake8 checks are passing and the style guide is followed
- [ ] Documentation (as docstrings) is complete and understandable
- [ ] Only files that have been actively changed are committed
Note that I have a very high bar for adding new dependencies for running novelWriter. I've gone to great lengths to avoid it, including writing my own Open Document implementation. There are only two required dependencies (enchant is optional), and I am very reluctant to increase it. I would also prefer if the typing didn't add extra complexity and overhead to the code, like the @overload stuff in common.py. Couldn't you just use "Any" for the input value and? (Edit: Ah, I see why now. Still ...)
Let my play around with this at my end for a bit before you put a lot of work into this.
I certainly understand the impulse to avoid extra dependencies, I looked for a good way to avoid it that didn't require Any types (it's best to use Any in as few places as nessaceraly or you defeat the point of adding types) or a rewrite with dataclass. And now that I think about it, the way I've written this I'm pretty sure typing_extensions is only a dev dependency like lxml_stubs. I can probably move the typed dict classes under a if TYPE_CHECKING check.
EDIT: I've done exactly that, no more runtime typing_extensions dependency.
as for the @overload decerators they are completely wiped out at exacution time, over shadowed by the actual impl so they don't actually increase complexity at all (except for that incurred by reading, fortunately very few functions ever require them).
I can hold off though if you want to see if there is a preferable approach for you.
I certainly understand the impulse to avoid extra dependencies, I looked for a good way to avoid it that didn't require Any types (it's best to use Any in as few places as nessaceraly or you defeat the point of adding types) or a rewrite with
dataclass. And now that I think about it, the way I've written this I'm pretty suretyping_extensionsis only a dev dependency likelxml_stubs. I can probably move the typed dict classes under aif TYPE_CHECKINGcheck.EDIT: I've done exactly that, no more runtime
typing_extensionsdependency.
That's much better. Dev dependencies are not an issue.
as for the
@overloaddecerators they are completely wiped out at exacution time, over shadowed by the actual impl so they don't actually increase complexity at all (except for that incurred by reading, fortunately very few functions ever require them).
In the particular case of the check functions, the value argument is meant to be Any, because it is intended to handle any data input from a data file. It was written for XML, which is always string input, but it has also been used for JSON data, which can be other types. The default argument type hint should be the same as the return value, i.e. str | None for checkString, and similar for the others.
The try/except makes sure anything that cannot be converted is captured.
I can hold off though if you want to see if there is a preferable approach for you.
There are a couple of reasons why I would like to hold off a bit. One is that I really need to refactor the NWProject class, so the effort will be partially wasted on that class. It is a complicated and messy structure, and one of the oldest parts of the code.
Among the changes I want to make are: Moving the whole XML parsing/writing into a dedicated XML module. This will isolate the XML code to a single place rather than now being spread around into multiple classes. I also want to extract the backup feature, because it needs some improvement.
I eventually want to move to using JSON for all files, which means the current XML code will only be preserved for project conversion reasons. Currently, the project data is a mix of XML, JSON, and text. This change will be primarily for novelWriter 2.0, so it's a long term plan, but I want to structure the project classes with this in mind. I plan to have an option to save the project data in an archive, similar to how Open Document does it. For this, I also need to abstract away the I/O into two storage classes with a compatible interface.
Another reason to hold off a bit is that I want to structure the typing so that it's easier to maintain. At work we usually keep a typing module where we define the custom stuff. This might be useful here too. But I'd like to play around with this first to see what I end up with. I've made branches in the past playing with this, but not found something I'm happy with yet, so I've held off on it.
I added one more commit to move to using Any in the check function in common.py. I think this is actually the better way. let the functions narrow the type etc.d
I also reworked the type overloads to let mypy resolve the types better, pyright was able to do it already.
As for holding off to refactor NWProject I can see desire. The extra type information could add noise while you refactor. But I'm not sure that should prevent typing the rest of the modules. Adding types to a existing project should always be a gradual thing but it doesn't get easier as the project advances; more the inverse, extra code momentum. trying to find the perfect way to maintain typing may just present a wall that is never breached.
with project.py I could just revert add a # type: ignore to the top of the file to be removed after the refactor. other places where it may not be perfect can always be touched up later. But, getting the types in makes it easier to keep adding types as things get refactored and the project moves forward. typing isn't a goal so much as it is a tool in python.
That said, I've satisfied my itch to make an attempt. If you would like to see more work on this PR I'm happy to keep plugging away. But please, don't feel pressured to accept any of the work here. There is a reason I marked this as a draft after all
With the latest update to Pylance in VSCode, I've started using basic type checking at work. We've mostly used it for generating documentation and more useful intelisense, but the checking is handy too.
I also turned it on in my private installation, and there is a lot in novelWriter. As you've already pointed out. There are however a few things it gets wrong, that is controlled for, so it isn't perfect.
Pylance is very good at inferring type, so I think I'm going to primarily start adding type hints to class methods and utility functions. It already helps quite a lot.
I wanted to cherry pick some of your changes here, but I've also looked at the more challenging cases, and I think I may actually rewrite the code a bit instead. For instance:
- Verbose logging. I'm considering just dropping this level and make it all debug. It is only useful for development anyway, and I never run just
--debug, always--verbose. So I may as well merge them. - The
checkInt,checkBoolandcheckFloatfunctions that caused you a lot of trouble can be simplified to not returnNoneat all. No part of the code allowsNoneany more. It only did so early on. - The
checkStringfunction does use theNonereturn in some places, but it's simpler to just add acheckStringNonefunction for those cases anyway. They are all for item handles anyway, and I really should add a type class for handles. It would make a lot of the code simpler.
Edit: After playing around with this all evening, it seems that a lot is cleared up with adding TypeGuard entries around my custom config loader class. There are still a bunch of issues with the Qt library though. Especially for its C++ int enums, which are just ints on the Python side, but the methods that take them as arguments expect the actual enums. PyQt5 decorators are also not handled properly.
Yes, the PyQt5-stubs are imperfect and place some of the enum values where they are defined and not where they end up exposed by the module level imports. you can usually import the enum from it's full path to bypass that.
TypeGuards are wonderful. sound like you have a solid plan now.
While refactoring a lot of the core code for 2.0, I managed to clear up nearly all of pylance's typing complaints. It's actually doing a very good job at analysing the code now. I've added type hints only in a few places, and then mostly for dictionary values that are returned by functions. The rest, pylance figures out on its own.
For now I think I will not add detailed typing info to the source. I do like the type checking to discover potential bugs, but I don't like the typing syntax itself. It adds a lot of clutter to the code. I do use type checking in Python, but mostly only for libraries, where they also help with API documentation. That isn't an issue in novelWrityer.
There is also the issue with typing in Python still being in flux between the still active Python releases. I also want to eventually drop lxml, which is causing issues here. However, the Python standard xml package is not a full replacement for lxml for my use until Python 3.9, so it will be a while still. I need at least Python 3.7 out of the way first.
sounds like you've found a satisfactory solution then. I'm glad typing was able to help improve code quality.