patma icon indicating copy to clipboard operation
patma copied to clipboard

Is abc.Sequence the right parent for sequence matches?

Open dmoisset opened this issue 4 years ago • 6 comments

The PEP says that a sequence match:

  • cannot be any kind of string (str, bytes, bytearray)
  • must be an instance of collections.abc.Sequence

I got why the first rule is there, I'm not sure if using abcs in the semantics of language constructs is something done elsewhere in Python, and I'm not sure if it's the correct thing to do (I thought operations tended to go towards the pure duck-typing style of attempting the method calls).

I found the following side effects of this decision surprising (can be checked in the binder playground):

  • I can not pattern match sequences with array.array, even if it supports len(), indexing and slicing. It silently fails to match unless I register it.
  • I can try to pattern match sequences with collections.deque which is registered as an collections.abc.Sequence. But if I use a star arg in the pattern, the matching raises a TypeError: sequence index must be integer, not 'slice', because deque does not support slice indexing (actually I'm not sure if that's a bug in the deque for not providing slices, in the abc for registering deque, or in the matching code for assuming slicing is available)

The current implementation assumes that negative indexing works, which again I can't find (I'm looking here and here) is a guaranteed contract of sequences. Should we document this?

I haven't looked carefully in the mapping pattern, but it's possible that similar things may be happening with it.

dmoisset avatar Jul 04 '20 17:07 dmoisset

Prior discussion starting at https://github.com/gvanrossum/patma/issues/15#issuecomment-635047668.

brandtbucher avatar Jul 04 '20 18:07 brandtbucher

The alternative is probing the type using duck typing, which is even worse -- in particular, there is no good way to distinguish a mapping from a sequence since mappings have __len__, __contains__, and __getitem__. (The traditional hack is to check for keys but I find that much more distasteful than requiring collections.abc.Sequence.)

The simplest solution to array.array matching is to register it as a Sequence in Python 3.10.

I didn't know we used slicing and negative indexing, maybe we shouldn't, but I'll let Brandt respond.

gvanrossum avatar Jul 04 '20 18:07 gvanrossum

The simplest solution to array.array matching is to register it as a Sequence in Python 3.10.

Yeah, I think that is a separate issue that looks like an oversight (although maybe there is some reason why). Probably somebody should just open a new BPO issue.

I didn't know we used slicing and negative indexing, maybe we shouldn't, but I'll let Brandt respond.

I agree that the target should not be required to support slicing and negative indexing. I'll get something up tonight/tomorrow.

The current implementation was mostly driven by a desire to keep new instructions to a minimum and piggyback off of the existing VM's capabilities for the time being. However, this change will probably be cleaner and faster for common cases, since we will avoid using the stack to manage Python int and slice objects just to index into common types like list or tuple.

brandtbucher avatar Jul 04 '20 20:07 brandtbucher

array.array doesn't define __reversed__. Adding that should be enough to safely register it as a MutableSequence.

~I'll open a BPO.~ Ha, looks like you already agreed this was a good idea 3 years ago @gvanrossum:

https://bugs.python.org/issue29727

Free for anyone who wants to take it. Tag me for review if you want.

brandtbucher avatar Jul 05 '20 04:07 brandtbucher

The new implementation never uses slices or negative indices (it's cheap because we've already gotten the length of the sequence by the time we reach this point):

>>> class S(tuple):
...     def __getitem__(self, i):
...         print(f"Getting {i}...")
...         return super().__getitem__(i)
... 
>>> match S(range(5)):
...     case first, *middle, last:
...         print(f"Got: {first}, {middle}, {last}")
... 
Getting 0...
Getting 1...
Getting 2...
Getting 3...
Getting 4...
Got: 0, [1, 2, 3], 4

brandtbucher avatar Jul 05 '20 15:07 brandtbucher

array.array objects are now recognized as MutableSequence (and therefore Reversible) in Python3.10:

Python 3.10.0a0 (heads/bpo-29727-dirty:af8fd07361, Jul  5 2020, 19:35:13)
[GCC 10.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import array
>>> x = array.array("i", [1,2,3])
>>> import collections.abc
>>> isinstance(x, collections.abc.MutableSequence)
True
>>> isinstance(x, collections.abc.Reversible)
True

pablogsal avatar Jul 05 '20 21:07 pablogsal