drake icon indicating copy to clipboard operation
drake copied to clipboard

python symbolic::Formula returns True for matrices (but not scalars)?

Open RussTedrake opened this issue 7 years ago • 52 comments

from pydrake.all import MathematicalProgram
prog = MathematicalProgram()
x = prog.NewContinuousVariables(2)
print(x)
print(x[0] == 0.)
print(x==[0.,0.])

I get the following output:

[x(0) x(1)]
(x(0) = 0)
[ True  True]

RussTedrake avatar Mar 09 '18 13:03 RussTedrake

It's because x == [0, 0] becomes [bool(x[0] == 0), bool(x[1] == 0)]. Python considers anything but the following items as True:

  • The Boolean value, False
  • Any numerical value equal to 0 (0, 0.0 but not 2 or -3.1)
  • The special value None
  • Any empty sequence or collection, including the empty string and the empty list

So, in the end, we have [True, True]. I'll check if there is a workaround for this.

soonho-tri avatar Mar 09 '18 14:03 soonho-tri

First thing I can do is to provide __nonzero__ for symbolic::Formula so that the above example throws an exception instead of giving a false value [True, True].

soonho-tri avatar Mar 09 '18 16:03 soonho-tri

I'm not sure I'm 100% understanding the bug report or subsequent discussion here. Russ, are you saying that the desired output should look like this?

[x(0) x(1)]
(x(0) = 0)
[(x(0) = 0) (x(1) = 0)]

jwnimmer-tri avatar Mar 09 '18 16:03 jwnimmer-tri

@jwnimmer-tri , I think that's what Russ wants.

soonho-tri avatar Mar 09 '18 16:03 soonho-tri

So we'll have:

  • python relational operators on Expression always return a Formula;
  • the __nonzero__ on Formula throws with a note telling users to call Evaluate explicitly.

I like this.

jwnimmer-tri avatar Mar 09 '18 16:03 jwnimmer-tri

python relational operators on Expression always return a Formula

FYI, this is what we already have as print(x[0] == 0.) returns x(0) == 0.

the __nonzero__ on Formula throws with a note telling users to call Evaluate explicitly.

I'll open a PR shortly. But the thing is that it will not give us what Russ wants for the above print(x==[0.,0.]) example. Instead of returning [(x(0) = 0) (x(1) = 0)], it will throw an exception as it can't evaluate x(0) = 0 without knowing the value of x(0).

I think there are no good solutions as long as we want to use both of == operator over numpy.ndarray. We need to give up one of the two.

  1. We might introduce a function Equal and make Equal([a, b], [0, 0]) return [a = 0, b = 0] as requested. Or;

  2. We can introduce a new class such as FormulaArray and provide __eq__ (and other relational operations) so that FormulaArray1 == FormulaArray2 works as requested.

I'm happy to learn if there is a better option here.

soonho-tri avatar Mar 09 '18 16:03 soonho-tri

@ericcousineau-tri says he might know a solution

RussTedrake avatar Mar 09 '18 19:03 RussTedrake

One thing is pybind11 only uses order in determining overloads - it starts with the first, and keeps on checking until it has a successful overload, thus the order in which you .def the overloads matters.

However, looking more at the discussion, it may seem to be something else; I'm guessing it's because NumPy's operator== is taking precedence over the return value (returning a logical array) rather than respecting the actual return type.

I'll see if there's a way around that.

EDIT: Can reproduce the issue per Soonho's comment in a simplified example: https://github.com/EricCousineau-TRI/repro/commit/c6398b55b1af2f7fb9fbd2e4050da5db5c3d2c63

EricCousineau-TRI avatar Mar 09 '18 20:03 EricCousineau-TRI

We might introduce a function Equal and make Equal([a, b], [0, 0]) return [a = 0, b = 0] as requested. Or;

I just found that numpy provides the function that I looked for: https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.logic.html#comparison

soonho-tri avatar Mar 09 '18 21:03 soonho-tri

I tested np.equal, and still get an output of logical values. The documentation for np.equal states:

out : ndarray or bool Output array of bools, or a single bool if x1 and x2 are scalars.

:(

There is a possibility of extending a custom ndarray to not return logicals for logical operations: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.subclassing.html#basics-subclassing

But that means that (a) we have to cast in pybind11 (not that big of a deal, I'd think) and (b) we'd have to take care that nd.array([<symbolic>...]) == nd.array([<symbolic>...]) does not occur.

EricCousineau-TRI avatar Mar 09 '18 21:03 EricCousineau-TRI

:-( Just found the same problem. It has dtype parameter. Do you think it might save us?

soonho-tri avatar Mar 09 '18 21:03 soonho-tri

Nay :( Seems to be constrained by ufunc:

>>> print(np.equal(av, bv, dtype=Custom))
Traceback (most recent call last):
  File "numpy_eq.py", line 28, in <module>
    print(np.equal(av, bv, dtype=Custom))
TypeError: No loop matching the specified signature and casting
was found for ufunc equal

It seems that your option (2) (perhaps have FormulaArray subclass ndarray), coupled with an error on __nonzero__, is the way to go?

EricCousineau-TRI avatar Mar 09 '18 21:03 EricCousineau-TRI

sigh... I'll dig it up this evening.

soonho-tri avatar Mar 09 '18 21:03 soonho-tri

One other option: From the results of https://github.com/RobotLocomotion/drake/issues/8116, if it is possible to write a custom dtype, and possibly register custom ufuncs, then perhaps there is a chance that we can register a ufunc for __eq__ for symbolic types.

That being said, I'm not sure if that will affect the behavior of np.equal if it's default dtype is bool.

EricCousineau-TRI avatar Mar 09 '18 21:03 EricCousineau-TRI

Will update this post with some other options:

  • https://docs.scipy.org/doc/numpy/reference/c-api.array.html#c.PyArray_SetNumericOps
  • https://docs.scipy.org/doc/numpy-1.13.0/user/basics.subclassing.html#basics-subclassing

Tracing up where I think ndarray.__eq__ comes from:

EricCousineau-TRI avatar Mar 09 '18 21:03 EricCousineau-TRI

Hup, it looks like there's a method numpy.set_numeric_ops (that has __doc__, but I can't find its online documentation), referenced in this post: https://stackoverflow.com/questions/45601964/source-code-location-and-explanation-of-vectorized-array-operations

We could use this to do something like:

np.set_numeric_ops(equal=drake_ufunc_equal)

and see if there's a way to teach the ufunc about how to handle different dtypes, so that we don't impact performance of other objects.

Example of a simple override (which would cause everything to slow down / suck):

>> eq = lambda a, b: a == b
>> generic_equal = np.frompyfunc(eq, 2, 1)
>> 
>> a = Custom('a')
>> b = Custom('b')
>> print(a == b)
eq('a', 'b')
>> 
>> av = np.array([a, b])
>> bv = np.array([b, a])
>> 
>> print(generic_equal(av, bv))
["eq('a', 'b')" "eq('b', 'a')"]
>> print(np.equal(av, bv))
[ True  True]
>> 
>> np.set_numeric_ops(equal=generic_equal)
>> print(np.equal(av, bv))
[ True  True]
>> print(av == bv)
["eq('a', 'b')" "eq('b', 'a')"]
>> np.equal = generic_equal
>> print(np.equal(av, bv))
["eq('a', 'b')" "eq('b', 'a')"]

EricCousineau-TRI avatar Mar 09 '18 21:03 EricCousineau-TRI

Looks nice! I'll try that in the coming PR.

soonho-tri avatar Mar 09 '18 22:03 soonho-tri

Hmm... Is it OK if we hold off on incorporating this just yet? I'd like to look a bit deeper and ensure that we don't incur any nasty side effects (e.g. killing speed for other libraries that aren't using Drake for image stuff, etc.), and see if there's an elegant solution to guarantee this.

What kind of timeline would you want this resolved on? (I've talked with @m-chaturvedi, and we'll start addressing #8116)

EricCousineau-TRI avatar Mar 09 '18 22:03 EricCousineau-TRI

What kind of timeline would you want this resolved on?

@RussTedrake can answer this (he marked this with high priority).

soonho-tri avatar Mar 09 '18 22:03 soonho-tri

it depends on the cost of resolving it... but I'm about to lecture about trajectory optimization, and this occurs very frequently when formulating trajectory optimizations.

RussTedrake avatar Mar 10 '18 00:03 RussTedrake

i also marked it as "high" priority because in it's current form, it's a bit of a landmine...

RussTedrake avatar Mar 10 '18 00:03 RussTedrake

Sounds good. I'll focus on the next few of days, because after having worked on it for a bit, I'm fairly confident that the solution for #8116 will most likely solve this too (if it goes like I planned it).

BTW: Posted StackOverflow as well: https://stackoverflow.com/questions/49205075/possible-to-have-logical-operations-e-g-ndarray-eq-not-return-bool-in

EricCousineau-TRI avatar Mar 10 '18 03:03 EricCousineau-TRI

Finally finished the spike test (with some generalizing / polishing), and it seems like user-defined dtypes in NumPy will solve both this issue and #8116: https://github.com/EricCousineau-TRI/repro/blob/1b1edfa/python/pybind11/dtype_stuff/test_basic.cc#L117-L128 https://github.com/EricCousineau-TRI/repro/blob/1b1edfa/python/pybind11/dtype_stuff/test_basic.py#L89-L93

Output from those lines in the Python script (out of sheer laziness, I had operator== just double the number):

>> av = np.array([Custom(1), Custom(2)])
>> mutate(av)
>> print(av)
[<Custom(101.0)> <Custom(102.0)>]
>> print(av == av)
[<Custom(202.0)> <Custom(204.0)>]

Required some shifty appeasements of both pybind11 and numpy. There is also the remaining issue about memory leaking here (which now that I have a working prototype, will respond there with some more info): https://github.com/numpy/numpy/issues/10721

While it sucks to leak memory, I think is the cleanest solution. In reviewing PyTorch, Tensorflow, etc., I realized that each of those might have some niceties, but would be a longer-term goal (using this article as a quick overview / guide: http://davidstutz.de/a-note-on-extending-tensorflow-pytorch-and-theano-for-deep-learning-on-custom-datastructures/ )

Will start polishing this to push into master.

EricCousineau-TRI avatar Mar 14 '18 05:03 EricCousineau-TRI

Right now I think the problem is that np.equal.types contains only XX->? loops, and you want an OO->O loop.

I think it would be pretty reasonable for numpy to add a OO->O loop to np.equal and the other comparison ops - with that in place, you could overload the == operator to just force this loop to be used.

eric-wieser avatar Mar 14 '18 06:03 eric-wieser

Hup, it looks like there's a method numpy.set_numeric_ops (that has doc, but I can't find its online documentation), referenced in this post

Note that just the other day an NEP was put forth suggesting this function be deprecated

eric-wieser avatar Mar 14 '18 06:03 eric-wieser

I think it would be pretty reasonable for numpy to add a OO->O loop to np.equal and the other comparison ops - with that in place, you could overload the == operator to just force this loop to be used.

Yup! Though I was a bit hesitant about forcing this into np from Drake, as it would change the behavior globally -- a main concern being, if OO->O was always the case, then would logical indexing with non-symbolic types (e.g. AutoDiffXd) still work the same? -- My guess is yes, minus the overhead of copying? (or whatever edge cases that don't handle implicit casting properly?)

I took the route of the user-defined dtypes because it let me take out two birds with one stone: by defining custom dypes, registering custom loops (in this case, SS->S for ==, AA->B where S is symbolic, A is autodiff, and B is bool) gets us there (1, 2), and then resolves #8116.

However, if that falls flat, we'll definitely use the additional ufunc type registration!

Note that just the other day an NEP was put forth suggesting this function be deprecated

Thanks for the heads up! I think we thankfully won't need it for now.

EricCousineau-TRI avatar Mar 14 '18 06:03 EricCousineau-TRI

Let me try and be a little more precise about the route I'm envisaging:

  1. numpy should implement the 'OO->O' loop for the comparison functions, but not expose it via the defaults - np.equal will never choose this loop without an explicit dtype parameter.

  2. AutoDiffXd overrides __array_ufunc__ to force this loop type when a comparison operator is used. This solves np.equal and == at the same time.

I'm not sure what you mean by logical indexing.

Custom dtypes are still a good design here, but if you already have the custom array subclasses, __array_ufunc__ would be a lot easier.

eric-wieser avatar Mar 14 '18 07:03 eric-wieser

numpy should implement the 'OO->O' loop for the comparison functions, but not expose it via the defaults - np.equal will never choose this loop without an explicit dtype parameter.

Gotcha! So then, for testing comparison on Expression, ~we'd need to be explicit?~ we'd just need to override __array_ufunc__ to use OO->O to get a Formula from expr1 == expr2, and for AutoDiffXd, we'd not need to override anything to get OO->B from ad1 == ad2?

EDIT: Realized my initial understand was a little less than optimal.

I'm not sure what you mean by logical indexing.

I was concerned about the following situation:

>>> import numpy as np
>>> np.version.version  # Ubuntu 16.04
'1.11.0'
>>> x = np.array([10., 20. 30.])
>>> b = np.array([True, False, True])
>>> x[b]
array([ 10.,  30.])
>>> x[b.astype(object)]
IndexError: arrays used as indices must be of integer (or boolean) type

But if we override AutoDiffXd.__array_ufunc__ to use AA->B (or OO->B), then that'd work!

EricCousineau-TRI avatar Mar 14 '18 13:03 EricCousineau-TRI

we'd just need to override array_ufunc to use OO->O to get a Formula from expr1 == expr2, and for AutoDiffXd, we'd not need to override anything to get OO->B fromad1 == ad2?

Exactly. If you want expr[expr != 0] to work, then obviously you'd also need to implement Expression.__getitem__, but that's probably an unpleasant can of worms.

eric-wieser avatar Mar 14 '18 15:03 eric-wieser

Partial fix in numpy/numpy#10745

eric-wieser avatar Mar 15 '18 01:03 eric-wieser