mlx icon indicating copy to clipboard operation
mlx copied to clipboard

`repr` and `str` for bool arrays should print `True` and `False` instead of `true` and `false`

Open dastrobu opened this issue 1 year ago • 5 comments

This is definitely low priority, but maybe worth fixing in the future.

__repr__:

If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment).

For bool arrays this is almost the case:

>>> a=mx.arange(2).astype(mx.bool_)
>>> repr(a)
'array([false, true], dtype=bool)'
>>> str(a)
'array([false, true], dtype=bool)'

c.f. numpy

>>> a=np.arange(2).astype(np.bool_)
>>> repr(a)
'array([False,  True])'
>>> str(a)
'[False  True]'

I understand that MLX being a C++ framework as well, the representation in C++ is true and false. So I guess this would need some special handling for the python wrapper to represent booleans as proper python literals.

dastrobu avatar Dec 31 '23 08:12 dastrobu

Possible we could allow for some basic number formatting in the printing of arrays. The python code can then set the appropriate format for True and False.

awni avatar Jan 02 '24 01:01 awni

Would like to work on this. I did some research and looks like the object is defined at this place https://github.com/ml-explore/mlx/blob/295ce9db094ba6934e3882347b3a929accc2772c/mlx/array.h#L32-L37

I'm pretty sure making changes here is not the right way to approach the solution. can I get a hint on how to solve this? i mean which python file I should look into for formatting? or do I have to create a new one?

ManishAradwad avatar Jan 02 '24 15:01 ManishAradwad

@ManishAradwad, I suggest considering the file at https://github.com/ml-explore/mlx/blob/b1441d14013ea1f2cee0e3a799a25f1bcfedefe6/python/src/array.cpp#L688 as a potential starting point.

Rather than generating the default string representation, it might be more appropriate to create a formatted representation. However, I'm uncertain about the most C++ idiomatic approach for this.

dastrobu avatar Jan 02 '24 15:01 dastrobu

The array printing code is here.

It looks like NumPy uses some global state (see set_format_options to manage the printing style. That may be a bit heavy handed for a first pass.

But one option as a starting point is have the printing routing take some kind of structure which manages the formatting options. The default is the current behavior. Following NumPy's example, that structure could have a formatter which holds a callable registered for each type. The python printing can override the callable to capitalize the boolean type.

I'm open to other maybe simpler ideas as well. But it would be good to make it somewhat general (while keeping it simple).

awni avatar Jan 02 '24 16:01 awni

@awni i agree with your solution but I have some doubts about this approach. I'll create a pr so that its easier to discuss.

ManishAradwad avatar Jan 03 '24 18:01 ManishAradwad