datajoint-python icon indicating copy to clipboard operation
datajoint-python copied to clipboard

fetching boolean attributes returns them as integers

Open ecobost opened this issue 4 years ago • 2 comments

When I fetch an attribute defined as boolean, it is returned as np.int64, it would be nice if it was returned as boolean. May just avoid some bugs. They can still be put into arrays as np.bool arrays. Also, an int array (even 0s and 1s) has some undesired behaviors if people forget to cast them as booleans:

In [19]: mask = data.Scan.Unit.fetch('is_soma', limit=5)                                                                                                                                                           

In [20]: mask                                                                                                                                                                                                      
Out[20]: array([0, 1, 1, 0, 0])

In [21]: ~mask                                                                                                                                                                                                      
Out[21]: array([-1, -2, -2, -1, -1])

In [36]: np.array([0, 1, 2, 3, 4])[mask]                                                                                                                                                                                 
Out[36]: array([0, 1, 1, 0, 0])

versus

In [37]: mask2 = mask.astype('bool')                                                                                                                                                                               

In [38]: mask2                                                                                                                                                                                                     
Out[38]: array([False,  True,  True, False, False])

In [39]: ~mask2                                                                                                                                                                                                    
Out[39]: array([ True, False, False,  True,  True])

In [43]: np.array([0, 1, 2, 3, 4])[mask2]                                                                                                                                                                          
Out[43]: array([1, 2])

dj version: 0.12.4

ecobost avatar Mar 09 '20 10:03 ecobost

The issue is that MySQL does not have a boolean datatype. Boolean is an alias for tinyint, so the distinction is lost.

If this is very important, We could designate boolean as a special datajoint type and handle it similarly to how we handle the uuid datatype with explicit handling.

dimitri-yatsenko avatar Apr 09 '20 15:04 dimitri-yatsenko

By the way, it's safer to use np.logical_not rather than the ~ for numpy arrays if you mean logical negation.

dimitri-yatsenko avatar Apr 09 '20 15:04 dimitri-yatsenko