funcparserlib
funcparserlib copied to clipboard
Improve typing support for _TupleParser
Right now, when working with 3 parsers combined together, the resulting type of the combined parser is Any
.
For instance,
from typing import reveal_type
from funcparserlib.parser import a
expr = a("x") + a("y") + a("z")
reveal_type(expr) # Parser[str, Any]
, and with 4 parsers, the resulting type is Tuple
:
expr2 = a("x") + a("y") + a("z") + a("w")
reveal_type(expr2) # _TupleParser[str, Tuple[Any, str]]
This is troublesome when working with mypy for type checking. For example, the second code snippet will lead to mypy error:
def parse(args: tuple[str, str, str, str]):
pass
expr2 = (a("x") + a("y") + a("z") + a("w")) >> parse
# Unsupported operand types for >> ("_TupleParser[str, tuple[Any, str]]" and "Callable[[tuple[str, str, str, str]], int]")
Ideally, for both examples, the resulting parsers should be _TupleParser
, and the resulting types should be recognized correctly, tuple[str, str, str]
and tuple[str, str, str, str]
respectively.
I've made some changes to the code to support the above cases by utilizing TypeVarTuple
(newly introduced for python 3.11, but we can use typing-extensions
to bring the functionality to earlier python versions). Would you be interested in this? I could make a PR for this.
@beekill95 Thanks for your idea! I managed to cover cases for 2-tuples with types, giving up on tuples of 3+ elements. The idea was to start small with types here. This issue for 4-tuples (and apparently on 6-tuples, 8-tuples, etc.) wasn't something I had unit tests for.
I'm open to improvements in typing for more complex cases, but I'd like to preserve compatibility with all supported versions of Python (3.8+ as for now). Also, I'd like for funcparserlib to keep having no package dependencies (dev-dependencies are OK though).
Are there any ways to improve typing for parsers of 3+ tuples without dropping compatibility with 3.8? Maybe via a dev-dependency on typing-extenstions
with imports guarded by typing.TYPE_CHECKING
?
I don't know how the typing would behave if we just install typing-extensions
as dev-dependency for users with python < 3.11. For users with mypy
already installed, then typing-extensions
would also be implicitly installed on their systems and the typing would work. However, for users without typing-extensions
explicitly/implicitly installed, I guess we could tell users to install typing-extensions
as dev-dependency as well so the typing works correctly. How do you feel about this?
Another option is to add only the relevant parts of TypeVarTuple
to the codebase: https://github.com/python/typing_extensions/blob/main/src/typing_extensions.py#L2257, but I don't know how viable it would be?
For unit testing, we could utilize typing_extensions.assert_type()
to write the tests.
@beekill95 I've experimented with TypeVarTuple
in several combinations:
- Both
typing
andtyping_extensions
forTypeVarTuple
- New
Tuple[*Ts]
vs backwards-compatibleTuple[Unpack[Ts]]
- Python 3.8, 3.9, 3.10, 3.11
- Both
try-catch ImportError
andif sys.version >= (3, 11)
for switching betweentyping
andtyping_extensions
- Mypy with
typing_extensions
, code inspections in PyCharm, code inspections in VS Code
I haven't found a combination that was good for most cases for Python < 3.11.
My best shot was this Patch A:
Index: tests/test_parsing.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/tests/test_parsing.py b/tests/test_parsing.py
--- a/tests/test_parsing.py (revision 18c0a99dcdb427e35226c74b7cc2617223c8e1fa)
+++ b/tests/test_parsing.py (date 1714172841079)
@@ -1,7 +1,17 @@
# -*- coding: utf-8 -*-
+import sys
import unittest
-from typing import Optional, Tuple
+from typing import Optional, Tuple, TYPE_CHECKING
+
+if TYPE_CHECKING:
+ if sys.version_info >= (3, 11):
+ from typing import TypeVarTuple, Unpack
+ else:
+ from typing_extensions import TypeVarTuple, Unpack
+
+ # noinspection PyTypeChecker
+ Ts = TypeVarTuple("Ts")
from funcparserlib.lexer import TokenSpec, make_tokenizer, LexerError, Token
from funcparserlib.parser import (
@@ -19,6 +29,19 @@
)
+def type_var_tuple_example(xs: "Tuple[Unpack[Ts]]") -> "Tuple[int, Unpack[Ts]]":
+ return (1,) + xs
+
+
+def use_type_var() -> None:
+ x: Tuple[int, str, int] = type_var_tuple_example(("foo", 1))
+ y: Tuple[int, str, int] = type_var_tuple_example(("foo", 1, 2))
+ print(x, y)
+
+ s, i = type_var_tuple_example(("foo",))
+ print(s + i)
+
+
class ParsingTest(unittest.TestCase):
def test_oneplus(self) -> None:
x = a("x")
Index: .pre-commit-config.yaml
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
--- a/.pre-commit-config.yaml (revision 18c0a99dcdb427e35226c74b7cc2617223c8e1fa)
+++ b/.pre-commit-config.yaml (date 1714089653084)
@@ -12,6 +12,8 @@
rev: "v1.7.0"
hooks:
- id: mypy
+ additional_dependencies:
+ - typing-extensions==4.11.0
- repo: local
hooks:
- id: unittest
But Patch A didn't work with code inspections in PyCharm. It produced false positive errors for functions that use typing.Unpack
in its return types. About 30% of Python developers use PyCharm, so it's better not to interfere with their editing experience, even though it's actually a bug in PyCharm's code inspections.
Another version was to run Mypy and take into account IDE inspections only for Python 3.11 and 3.12, this is Patch B:
Index: tests/test_parsing.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/tests/test_parsing.py b/tests/test_parsing.py
--- a/tests/test_parsing.py (revision 18c0a99dcdb427e35226c74b7cc2617223c8e1fa)
+++ b/tests/test_parsing.py (date 1714173330602)
@@ -1,7 +1,12 @@
# -*- coding: utf-8 -*-
import unittest
-from typing import Optional, Tuple
+from typing import Optional, Tuple, TYPE_CHECKING
+
+if TYPE_CHECKING:
+ from typing import TypeVarTuple
+
+ Ts = TypeVarTuple("Ts")
from funcparserlib.lexer import TokenSpec, make_tokenizer, LexerError, Token
from funcparserlib.parser import (
@@ -19,6 +24,19 @@
)
+def type_var_tuple_example(xs: "Tuple[*Ts]") -> "Tuple[int, *Ts]": # noqa: F722
+ return (1,) + xs
+
+
+def use_type_var() -> None:
+ x: Tuple[int, str, int] = type_var_tuple_example(("foo", 1))
+ y: Tuple[int, str, int] = type_var_tuple_example(("foo", 1, 2))
+ print(x, y)
+
+ s, i = type_var_tuple_example(("foo",))
+ print(s + i)
+
+
class ParsingTest(unittest.TestCase):
def test_oneplus(self) -> None:
x = a("x")
Index: .pre-commit-config.yaml
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
--- a/.pre-commit-config.yaml (revision 18c0a99dcdb427e35226c74b7cc2617223c8e1fa)
+++ b/.pre-commit-config.yaml (date 1714173262135)
@@ -12,6 +12,7 @@
rev: "v1.7.0"
hooks:
- id: mypy
+ language_version: "3.12"
- repo: local
hooks:
- id: unittest
I wonder how many people may still want to use Mypy or edit their code in IDEs with Python 3.10 or older. Patch B won't work for them and they won't be able to fix things by installing typing_extensions
.
At this moment I haven't come up with better options that I would be confident to introduce. I'm open to new options.
@beekill95 I'm wondering how your PR might look like... If it solves typing for tuple parsers of any length, it may convince me to drop support for 3.10 and earlier from the code editor perspective. It would still be possible to use funcparserlib with 3.8-3.10 (e.g. for anyone still using it as a dependency in production). Only actual code editing experience for 3.8-3.10 will become worse.
@vlasovskikh, I just created a draft PR #81. I've tested on my editors (VScode/Neovim + pyright lsp) with python 3.10, the type hinting seems correct for me:
from __future__ import annotations
from typing import reveal_type
from funcparserlib.parser import a
expr1 = a("x") + a("y") + (a("1") >> int) + (a("1.2") >> float)
reveal_type(expr1) # Type of "expr1" is "_TupleParser[str, str, str, int, float]"
reveal_type(expr1.__rshift__) # Type of "expr1.__rshift__" is "(f: (Tuple[str, str, int, float]) -> _C@__rshift__) -> Parser[str, _C@__rshift__]"
You're right about Pycharm, they're still implementing the support for TypeVarTuple
(https://youtrack.jetbrains.com/issue/PY-53105/Support-PEP-646-Variadic-Generics).
Let me know how you feel about this approach.
@beekill95 I've quickly looked through the PR and noticed a few possible problems with typing. Pre-commit checks have caught these problems too.
Based on your idea about TypeVarTuple
for parsers of 3+ tuples, I've experimented with TypeVarTuple
for a simpler example, that models parsers pretty well, despite its simplicity. Here's what I got. It passes Mypy checks as well as unit tests. It checks (almost) every combination of adding simple values, ignored values, and tuples of values for sizes 1 to 4. 5 and more should be similar to 3 and 4. We could adopt this example as the basis for updating the __add__
for parser.
Source code of the example:
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Generic, TypeVar, TypeVarTuple, assert_type, overload
from unittest import TestCase
T = TypeVar("T")
V = TypeVar("V")
Ts = TypeVarTuple("Ts")
Ks = TypeVarTuple("Ks")
@dataclass
class Ign:
value: None = field(default=None, init=False)
@overload
def __add__(self, other: Ign) -> Ign:
pass
@overload
def __add__(self, other: Val[T]) -> Val[T]:
pass
@overload
def __add__(self, other: Tpl[*Ts]) -> Tpl[*Ts]:
pass
def __add__(self, other: Val[T] | Tpl[*Ts] | Ign) -> Val[T] | Tpl[*Ts] | Ign:
if isinstance(other, Ign):
return self
else:
return other
@dataclass
class Val(Generic[T]):
value: T
def __neg__(self) -> Ign:
return Ign()
@overload
def __add__(self, other: Ign) -> Val[T]:
pass
@overload
def __add__(self, other: Val[V]) -> Tpl[T, V]:
pass
@overload
def __add__(self, other: Tpl[*Ts]) -> Tpl[T, tuple[*Ts]]:
pass
def __add__(
self, other: Val[V] | Tpl[*Ts] | Ign
) -> Tpl[T, V] | Tpl[T, tuple[*Ts]] | Val[T]:
if isinstance(other, Ign):
return self
else:
return Tpl((self.value, other.value)) # type: ignore[return-value]
@dataclass
class Tpl(Generic[*Ts]):
value: tuple[*Ts]
@overload
def __add__(self, other: Ign) -> Tpl[*Ts]:
pass
@overload
def __add__(self, other: Val[T]) -> Tpl[*Ts, T]:
pass
@overload
def __add__(self, other: Tpl[*Ks]) -> Tpl[*Ts, tuple[*Ks]]:
pass
def __add__(
self, other: Val[T] | Tpl[*Ks] | Ign
) -> Tpl[*Ts, T] | Tpl[*Ts] | Tpl[*Ts, tuple[*Ks]]:
if isinstance(other, Ign):
return self
else:
return Tpl(self.value + (other.value,)) # type: ignore[return-value]
class TypeVarTupleTest(TestCase):
def test_every_1(self) -> None:
assert_type(Val("a"), Val[str])
assert_type(-Val("a"), Ign)
self.assertEqual(Val("a").value, "a")
self.assertEqual((-Val("a")).value, None)
def test_every_2(self) -> None:
assert_type(-Val("a") + -Val("b"), Ign)
assert_type(Val("a") + -Val("b"), Val[str])
assert_type(-Val("a") + Val("b"), Val[str])
assert_type(Val("a") + Val("b"), Tpl[str, str])
self.assertEqual((-Val("a") + -Val("b")).value, None)
self.assertEqual((Val("a") + -Val("b")).value, "a")
self.assertEqual((-Val("a") + Val("b")).value, "b")
self.assertEqual((Val("a") + Val("b")).value, ("a", "b"))
def test_all_3_ignored_of_3(self) -> None:
assert_type(-Val("a") + -Val("b") + -Val("c"), Ign)
assert_type(-Val("a") + (-Val("b") + -Val("c")), Ign)
self.assertEqual((-Val("a") + -Val("b") + -Val("c")).value, None)
self.assertEqual((-Val("a") + (-Val("b") + -Val("c"))).value, None)
def test_every_2_ignored_of_3(self) -> None:
# Left-associative
assert_type(Val("a") + -Val("b") + -Val("c"), Val[str])
assert_type(-Val("a") + Val("b") + -Val("c"), Val[str])
assert_type(-Val("a") + -Val("b") + Val("c"), Val[str])
self.assertEqual((Val("a") + -Val("b") + -Val("c")).value, "a")
self.assertEqual((-Val("a") + Val("b") + -Val("c")).value, "b")
self.assertEqual((-Val("a") + -Val("b") + Val("c")).value, "c")
# Right-associative
assert_type(Val("a") + (-Val("b") + -Val("c")), Val[str])
assert_type(-Val("a") + (Val("b") + -Val("c")), Val[str])
assert_type(-Val("a") + (-Val("b") + Val("c")), Val[str])
self.assertEqual((Val("a") + (-Val("b") + -Val("c"))).value, "a")
self.assertEqual((-Val("a") + (Val("b") + -Val("c"))).value, "b")
self.assertEqual((-Val("a") + (-Val("b") + Val("c"))).value, "c")
def every_1_ignored_of_3(self) -> None:
# Left-associative
assert_type(Val("a") + Val("b") + -Val("c"), Tpl[str, str])
assert_type(Val("a") + -Val("b") + Val("c"), Tpl[str, str])
assert_type(-Val("a") + Val("b") + Val("c"), Tpl[str, str])
self.assertEqual((Val("a") + Val("b") + -Val("c")).value, ("a", "b"))
self.assertEqual((Val("a") + -Val("b") + Val("c")).value, ("a", "c"))
self.assertEqual((-Val("a") + Val("b") + Val("c")).value, ("b", "c"))
# Right-associative
assert_type(Val("a") + (Val("b") + -Val("c")), Tpl[str, str])
assert_type(Val("a") + (-Val("b") + Val("c")), Tpl[str, str])
assert_type(-Val("a") + (Val("b") + Val("c")), Tpl[str, str])
self.assertEqual((Val("a") + (Val("b") + -Val("c"))).value, ("a", "b"))
self.assertEqual((Val("a") + (-Val("b") + Val("c"))).value, ("a", "c"))
self.assertEqual((-Val("a") + (Val("b") + Val("c"))).value, ("b", "c"))
def test_all_3(self) -> None:
assert_type(Val("a") + Val("b") + Val("c"), Tpl[str, str, str])
assert_type(Val("a") + (Val("b") + Val("c")), Tpl[str, tuple[str, str]])
self.assertEqual((Val("a") + Val("b") + Val("c")).value, ("a", "b", "c"))
self.assertEqual((Val("a") + (Val("b") + Val("c"))).value, ("a", ("b", "c")))
def test_all_4(self) -> None:
assert_type(
Val("a") + Val("b") + Val("c") + Val("d"),
Tpl[str, str, str, str],
)
assert_type(
Val("a") + Val("b") + (Val("c") + Val("d")),
Tpl[str, str, tuple[str, str]],
)
assert_type(
Val("a") + (Val("b") + Val("c")) + Val("d"),
Tpl[str, tuple[str, str], str],
)
assert_type(
Val("a") + (Val("b") + Val("c") + Val("d")),
Tpl[str, tuple[str, str, str]],
)
self.assertEqual(
(Val("a") + Val("b") + Val("c") + Val("d")).value,
("a", "b", "c", "d"),
)
self.assertEqual(
(Val("a") + Val("b") + (Val("c") + Val("d"))).value,
("a", "b", ("c", "d")),
)
self.assertEqual(
(Val("a") + (Val("b") + Val("c")) + Val("d")).value,
("a", ("b", "c"), "d"),
)
self.assertEqual(
(Val("a") + (Val("b") + Val("c") + Val("d"))).value,
("a", ("b", "c", "d")),
)
@east825 ^ FYI this example is correct from the perspective of PEP 646. Both Mypy and Pylance are fine with it. However, in PyCharm, it results in IllegalArgumentException: Unbounded unpacked tuple type of a TypeVarTuple or another unpacked tuple type is now allowed
. Sorry to bother you with this mention, I hope you don't mind, feel free to ignore it :)
@vlasovskikh, I've fixed some of the pre-commit errors, however, it seems that this is not as easy as I thought. Here are the problems I'm facing:
- I cannot put the
typing_extensions
imports behindTYPE_CHECKING
as_TupleParser
inherits fromParser[A, Tuple[Unpack[_Ts]]]
, which means the class depends onUnpack
at runtime, not just when type checking. - Incompatible return value type between
Parser.__add__()
and_TupleParser.__add__()
. The reason being that when callingsuper().__add__()
, theself
becomesParser[A, tuple[*_Ts]]
and the return type is then_TupleParser[A, tuple[*_Ts], Any]
while the return type should be_TupleParser[A, *_Ts, B]
. To resolve this, I think I will have to move the logic of adding two parsers to_TupleParser.__add__()
and avoid callingsuper().__add__()
entirely.
@beekill95 Sorry for the delay in replying to your comment and your updated PR. I've looked at this problem shortly and haven't found a working solution yet. It looks like it won't be enough to make changes to type signatures. We may have to make _TupleParser
public and not a child of Parser
... I'm not sure about it yet. I'm afraid I have to keep it on pause from my side for now. I'll get back to it when I have more free time 😞 By the way, I still consider better typing as a possible direction, maybe for a major incompatible release with dropping compatibility with 3.10 and older.