matplotlib icon indicating copy to clipboard operation
matplotlib copied to clipboard

[Bug]: TypeError when plotting against list of datetime.date where 0th element of list is None

Open myUsernameIsNotMyPassword opened this issue 2 years ago • 10 comments

Bug summary

pyplot raises an error when a list of dates starts with None.

Code for reproduction

from datetime import *

from matplotlib import pyplot as plt


y = [6, 2, 8, 3, 1, 8, 5, 3, 0, 7]

x = [date.today() + timedelta(days=i) for i in range(10)]
x[5] = None
x[-1] = None

plt.plot(x, y) # works
plt.show()

x[0] = None

plt.plot(x, y) # TypeError
plt.show()

Actual outcome

Traceback (most recent call last):
  File "pyplot_test.py", line 17, in <module>
    plt.plot(x, y)
  File "<python_path>\lib\site-packages\matplotlib\pyplot.py", line 2769, in plot
    return gca().plot(
  File "<python_path>\lib\site-packages\matplotlib\axes\_axes.py", line 1634, in plot
    self.add_line(line)
  File "<python_path>\lib\site-packages\matplotlib\axes\_base.py", line 2288, in add_line
    self._update_line_limits(line)
  File "<python_path>\lib\site-packages\matplotlib\axes\_base.py", line 2311, in _update_line_limits
    path = line.get_path()
  File "<python_path>\lib\site-packages\matplotlib\lines.py", line 999, in get_path
    self.recache()
  File "<python_path>\lib\site-packages\matplotlib\lines.py", line 652, in recache
    x = _to_unmasked_float_array(xconv).ravel()
  File "<python_path>\lib\site-packages\matplotlib\cbook\__init__.py", line 1298, in _to_unmasked_float_array
    return np.asarray(x, float)
  File "<python_path>\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)
TypeError: float() argument must be a string or a number, not 'datetime.date'

Expected outcome

plot data starting at index 1.

Additional information

No response

Operating system

windows 10

Matplotlib Version

3.5.2

Matplotlib Backend

TkAgg

Python version

3.8.2

Jupyter version

No response

Installation

pip

Unfortunately I think this is the expected behavior (I was actually a bit surprised it worked with Nones at all). It works because in the time handling code paths we delegate most of the work to numpy and it will cast None to NaT :

In [7]: np.asarray(x, dtype=np.datetime64)
Out[7]: 
array(['2022-08-07', '2022-08-08', '2022-08-09', '2022-08-10',
       '2022-08-11',        'NaT', '2022-08-13', '2022-08-14',
       '2022-08-15', '2022-08-16'], dtype='datetime64[D]')

In [8]: x
Out[8]: 
[datetime.date(2022, 8, 7),
 datetime.date(2022, 8, 8),
 datetime.date(2022, 8, 9),
 datetime.date(2022, 8, 10),
 datetime.date(2022, 8, 11),
 None,
 datetime.date(2022, 8, 13),
 datetime.date(2022, 8, 14),
 datetime.date(2022, 8, 15),
 datetime.date(2022, 8, 16)]

The reason this fails with None as the first element is that we do not correctly identify the units (and hence which handlers to use). For the automatic unit detection (we handle datetime through the unit system because time is a number that has time-like units attached to it) we look at the first element (because we assume that the input lists are homogeneous).

We could add a special case to that dispatch to keep walking down the sequence if a None is found. On one hand that would fix this issue and smooth out the rough edge identified here. On the other hand it makes that code even more complex.

I could also see a case for exploding a lot earlier if there is a None in the data, but that is just doubling down on a failure mode that depends on the location in the input of a bad value which is less than ideal.

In the short term, the user-side fix is to use np.array(..., dytpe=np.datetime64) before you pass the values to Matplotlib which ensure that the first element is always a type that will go down the correct unit dispatch.

tacaswell avatar Aug 07 '22 18:08 tacaswell

diff --git i/lib/matplotlib/cbook/__init__.py w/lib/matplotlib/cbook/__init__.py
index b9d5f5eb26..9cbf273259 100644
--- i/lib/matplotlib/cbook/__init__.py
+++ w/lib/matplotlib/cbook/__init__.py
@@ -1691,19 +1691,12 @@ def safe_first_element(obj):
     This is an type-independent way of obtaining the first element, supporting
     both index access and the iterator protocol.
     """
-    if isinstance(obj, collections.abc.Iterator):
-        # needed to accept `array.flat` as input.
-        # np.flatiter reports as an instance of collections.Iterator
-        # but can still be indexed via [].
-        # This has the side effect of re-setting the iterator, but
-        # that is acceptable.
-        try:
-            return obj[0]
-        except TypeError:
-            pass
-        raise RuntimeError("matplotlib does not support generators "
-                           "as input")
-    return next(iter(obj))
+    if isinstance(obj, np.flatiter):
+        return obj[0]
+    elif isinstance(obj, collections.abc.Iterator):
+        raise RuntimeError("matplotlib does not support generators as input")
+    else:
+        return next(val for val in obj if val is not None)
 
 
 def sanitize_sequence(data):

seems reasonable enough to me? (except that the function becomes a bit misnamed, but such is life). This assumes that the goal of the first cutout is indeed just to support flatiter (which seems to be the intent of #6712).

anntzer avatar Aug 07 '22 20:08 anntzer

None working elsewhere in the list seems to me an implementation accident rather than something we purposely designed. I'm not aware of us using None to mean no data elsewhere in the library (but I could be wrong) so I am not sure we should special case it here.

jklymak avatar Aug 07 '22 20:08 jklymak

I think this is an "in for a penny, in for a pound" situation. We are very permissive in terms of what we take as input and currently do take lists of datetimes with None anywhere but the first element. To be consistent we either need to lock down what we take as input more (which is not really feasible) or take this extra complexity.

I'm leaning towards "take the complexity" because that is our job.

tacaswell avatar Aug 07 '22 21:08 tacaswell

I'm labeling this as a good first issue because we have a plausbile looking patch and there is no (explicit) new public API.

Steps:

  1. convert the patch in https://github.com/matplotlib/matplotlib/issues/23580#issuecomment-1207477074 into a commit
  2. add a test (the OP's example is enough)

tacaswell avatar Aug 07 '22 21:08 tacaswell

Hi there, Can I work on this?

MikiPWata avatar Aug 07 '22 21:08 MikiPWata

@MikiPWata Yes, just be aware we do not consider an issue "claimed" until there is a PR and that there is not complete consensus among the devs on this yet.

tacaswell avatar Aug 07 '22 22:08 tacaswell

I checked that None is also ignored in bare lists of floats, so I guess this is OK. We probably should document this somehow though...

jklymak avatar Aug 07 '22 23:08 jklymak

@jklymak my PR is pretty much good to go(passed all checks), so would love to have a review. Thank you in advance!

MikiPWata avatar Aug 09 '22 01:08 MikiPWata

well, we finally had two people take up a "good first issue" at about the same time.

tacaswell avatar Aug 09 '22 02:08 tacaswell

Is there anything I can/should do to fix the test failures from CI?

MikiPWata avatar Aug 11 '22 02:08 MikiPWata

Finally passed all checks! I was wondering if I should keep using the original methods(safe_first_element()) or the _safe_non_none() for all tests that uses safe_first_element.

I am pretty sure that I should be using the new method for the test made for this issue.

MikiPWata avatar Aug 11 '22 19:08 MikiPWata

Closed by #23587.

timhoffm avatar Aug 11 '22 21:08 timhoffm