python-tabulate
python-tabulate copied to clipboard
Bug fix for issue #318 | `intfmt`: error for strings which contain integer values
Bug fix for issue #318 | intfmt
: error for strings which contain integer values
Description.
Resolved the problem with the intfmt
argument.
Added tests for these changes.
Test results: 308 passed, 2 skipped in 1.10s
.
Thank you for identifying the issue and proposing a bug fix!
I was wondering that maybe we should rather convert val
to int
instead, similarly to what happens when valtype
is float
. That way, an int
in a string would still get formatted, like what happens for a float
. What do you think?
Also, I see you added a nice test for colored int
s. If we choose to format int
s passed as string, I think it would be nice to format them when the string has color too, same as happens for float
. I'd be willing to provide a new PR for that if we decide it's an improvement worth having.
I was wondering that maybe we should rather convert
val
toint
instead, similarly to what happens whenvaltype
isfloat
. That way, anint
in a string would still get formatted, like what happens for afloat
. What do you think?Also, I see you added a nice test for colored
int
s. If we choose to formatint
s passed as string, I think it would be nice to format them when the string has color too, same as happens forfloat
. I'd be willing to provide a new PR for that if we decide it's an improvement worth having.
In fact, it was one of my first solutions, but this breaks some unit tests. I made several changes before arriving at the proposed solution, but the most important thing is that after all these experiments, I discovered that tabulate
may need deeper refactoring.
Unit test
Here are the results of the unit tests with this change:
elif valtype is int:
# if isinstance(val, str):
# intfmt = ""
return format(int(val), intfmt)
Unit tests results:
==================================================================== FAILURES ====================================================================
_______________________________________________________ test_intfmt_with_string_as_integer _______________________________________________________
def test_intfmt_with_string_as_integer():
"Output: integer format"
result = tabulate([[82642], ["1500"], [2463]], intfmt=",", tablefmt="plain")
expected = "82,642\n 1500\n 2,463"
> assert_equal(expected, result)
test/test_output.py:2646:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
expected = '82,642\n 1500\n 2,463', result = '82,642\n 1,500\n 2,463' 10:14:37 [140/612]
def assert_equal(expected, result):
print("Expected:\n%s\n" % expected)
print("Got:\n%s\n" % result)
> assert expected == result
E AssertionError
test/common.py:8: AssertionError
-------------------------------------------------------------- Captured stdout call --------------------------------------------------------------
Expected:
82,642
1500
2,463
Got:
82,642
1,500
2,463
____________________________________________________________ test_intfmt_with_colors _____________________________________________________________
def test_intfmt_with_colors():
"Regression: Align ANSI-colored values as if they were colorless."
colortable = [
("abcd", 42, "\x1b[31m42\x1b[0m"),
("elfy", 1010, "\x1b[32m1010\x1b[0m"),
]
colorheaders = ("test", "\x1b[34mtest\x1b[0m", "test")
> formatted = tabulate(colortable, colorheaders, "grid", intfmt=",")
test/test_output.py:2664:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tabulate/__init__.py:2198: in tabulate
cols = [
tabulate/__init__.py:2199: in <listcomp> 10:14:37 [105/612]
[_format(v, ct, fl_fmt, int_fmt, miss_v, has_invisible) for v in c]
tabulate/__init__.py:2199: in <listcomp>
[_format(v, ct, fl_fmt, int_fmt, miss_v, has_invisible) for v in c]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
val = '\x1b[31m42\x1b[0m', valtype = <class 'int'>, floatfmt = 'g', intfmt = ',', missingval = '', has_invisible = True
def _format(val, valtype, floatfmt, intfmt, missingval="", has_invisible=True):
"""Format a value according to its type.
Unicode is supported:
>>> hrow = ['\u0431\u0443\u043a\u0432\u0430', '\u0446\u0438\u0444\u0440\u0430'] ; \
tbl = [['\u0430\u0437', 2], ['\u0431\u0443\u043a\u0438', 4]] ; \
good_result = '\\u0431\\u0443\\u043a\\u0432\\u0430 \\u0446\\u0438\\u0444\\u0440\\u0430\\n------- -------\\n\\u0430\\u0437
2\\n\\u0431\\u0443\\u043a\\u0438 4' ; \
tabulate(tbl, headers=hrow) == good_result
True
""" # noqa
if val is None:
return missingval
if valtype is str:
return f"{val}"
elif valtype is int:
# if isinstance(val, str):
# intfmt = ""
> return format(int(val), intfmt)
E ValueError: invalid literal for int() with base 10: '\x1b[31m42\x1b[0m'
tabulate/__init__.py:1234: ValueError
________________________________________________________ test_alignment_of_colored_cells _________________________________________________________
def test_alignment_of_colored_cells(): 10:14:37 [70/612]
"Regression: Align ANSI-colored values as if they were colorless."
colortable = [
("test", 42, "\x1b[31m42\x1b[0m"),
("test", 101, "\x1b[32m101\x1b[0m"),
]
colorheaders = ("test", "\x1b[34mtest\x1b[0m", "test")
> formatted = tabulate(colortable, colorheaders, "grid")
test/test_regression.py:30:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tabulate/__init__.py:2198: in tabulate
cols = [
tabulate/__init__.py:2199: in <listcomp>
[_format(v, ct, fl_fmt, int_fmt, miss_v, has_invisible) for v in c]
tabulate/__init__.py:2199: in <listcomp>
[_format(v, ct, fl_fmt, int_fmt, miss_v, has_invisible) for v in c]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
val = '\x1b[31m42\x1b[0m', valtype = <class 'int'>, floatfmt = 'g', intfmt = '', missingval = '', has_invisible = True
def _format(val, valtype, floatfmt, intfmt, missingval="", has_invisible=True):
"""Format a value according to its type.
Unicode is supported:
>>> hrow = ['\u0431\u0443\u043a\u0432\u0430', '\u0446\u0438\u0444\u0440\u0430'] ; \
tbl = [['\u0430\u0437', 2], ['\u0431\u0443\u043a\u0438', 4]] ; \
good_result = '\\u0431\\u0443\\u043a\\u0432\\u0430 \\u0446\\u0438\\u0444\\u0440\\u0430\\n------- -------\\n\\u0430\\u0437
2\\n\\u0431\\u0443\\u043a\\u0438 4' ; \
tabulate(tbl, headers=hrow) == good_result
True
""" # noqa
if val is None:
return missingval
if valtype is str:
return f"{val}"
elif valtype is int:
# if isinstance(val, str):
# intfmt = ""
> return format(int(val), intfmt)
E ValueError: invalid literal for int() with base 10: '\x1b[31m42\x1b[0m'
tabulate/__init__.py:1234: ValueError
________________________________________________________ test_ansi_color_bold_and_fgcolor ________________________________________________________
def test_ansi_color_bold_and_fgcolor():
"Regression: set ANSI color and bold face together (issue #65)"
table = [["1", "2", "3"], ["4", "\x1b[1;31m5\x1b[1;m", "6"], ["7", "8", "9"]]
> result = tabulate(table, tablefmt="grid")
test/test_regression.py:374:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tabulate/__init__.py:2198: in tabulate
cols = [
tabulate/__init__.py:2199: in <listcomp>
[_format(v, ct, fl_fmt, int_fmt, miss_v, has_invisible) for v in c]
tabulate/__init__.py:2199: in <listcomp>
[_format(v, ct, fl_fmt, int_fmt, miss_v, has_invisible) for v in c]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
val = '\x1b[1;31m5\x1b[1;m', valtype = <class 'int'>, floatfmt = 'g', intfmt = '', missingval = '', has_invisible = True
def _format(val, valtype, floatfmt, intfmt, missingval="", has_invisible=True):
"""Format a value according to its type.
Unicode is supported:
>>> hrow = ['\u0431\u0443\u043a\u0432\u0430', '\u0446\u0438\u0444\u0440\u0430'] ; \
tbl = [['\u0430\u0437', 2], ['\u0431\u0443\u043a\u0438', 4]] ; \
good_result = '\\u0431\\u0443\\u043a\\u0432\\u0430 \\u0446\\u0438\\u0444\\u0440\\u0430\\n------- -------\\n\\u0430\\u0437 2\\n\\u0431\\u0443\\u043a\\u0438 4' ; \
tabulate(tbl, headers=hrow) == good_result
True
""" # noqa
if val is None:
return missingval
if valtype is str:
return f"{val}"
elif valtype is int:
# if isinstance(val, str):
# intfmt = ""
> return format(int(val), intfmt)
E ValueError: invalid literal for int() with base 10: '\x1b[1;31m5\x1b[1;m'
tabulate/__init__.py:1234: ValueError
============================================================ short test summary info =============================================================
FAILED test/test_output.py::test_intfmt_with_string_as_integer - AssertionError
FAILED test/test_output.py::test_intfmt_with_colors - ValueError: invalid literal for int() with base 10: '\x1b[31m42\x1b[0m'
FAILED test/test_regression.py::test_alignment_of_colored_cells - ValueError: invalid literal for int() with base 10: '\x1b[31m42\x1b[0m'
FAILED test/test_regression.py::test_ansi_color_bold_and_fgcolor - ValueError: invalid literal for int() with base 10: '\x1b[1;31m5\x1b[1;m'
==================================================== 4 failed, 304 passed, 2 skipped in 2.06s ====================================================
Conclusions
Description
If we made this change, the strings which contain integers will be formatted. It is not bad, but I was thinking that if I want some number as a string is because maybe I would rather not format as an integer. It will be a hard decision for the owners and maintainers of this Python package.
In the test test_intfmt_with_string_as_integer
, the data [[82642], ["1500"], [2463]]
is transformed to 82,642 | 1,500 | 2,463
.
We need to explore the way to solve the colored numbers. I'll research in the code if there is something that I can reuse or create a new functionality to detect the colored number and format the integer.
Taking advantage of this talk, I would like to recommend that each data value be treated independently, since apparently if it encounters at least one floating number, it takes all numbers as floats, even if they are not.
@pytest.mark.skip(reason="It detects all values as floats but there are strings and integers.")
def test_intfmt_with_string_with_floats():
"Output: integer format"
result = tabulate([[82000.38], ["1500.47"], ["2463"], [92165]], intfmt=",", tablefmt="plain")
expected = "82000.4\n 1500.47\n 2,463\n92,165"
assert_equal(expected, result)
For this, data [[82000.38], ["1500.47"], ["2463"], [92165]]
all the numbers were assigned as a float.
val: 82000.38
val: <class 'float'>
valtype: <class 'float'>
val: 1500.47
val: <class 'str'>
valtype: <class 'float'>
val: 2463
val: <class 'str'>
valtype: <class 'float'>
val: 92165
val: <class 'int'>
valtype: <class 'float'>
Expected:
82000.4
1500.47
2,463
92,165
Got:
82000.4
1500.47
2463
92165
Furthermore, I noticed that the float 82000.38
is rounded to 82000.4
, but not the float 1500.47
. Is this an expected behavior?
test_intfmt_with_string_as_integer
tabulate([[82642], ["1500"], [2463]], intfmt=",", tablefmt="plain")
Expected: 82,642 1500 2,463
Got: 82,642 1,500 2,463
test_intfmt_with_colors
tabulate(colortable, colorheaders, "grid", intfmt=",")
ValueError: invalid literal for int() with base 10: \x1b[31m42\x1b[0m
--
test_alignment_of_colored_cells
colortable = [
("test", 42, "\x1b[31m42\x1b[0m"),
("test", 101, "\x1b[32m101\x1b[0m"),
]
colorheaders = ("test", "\x1b[34mtest\x1b[0m", "test")
formatted = tabulate(colortable, colorheaders, "grid")
ValueError: invalid literal for int() with base 10: \x1b[31m42\x1b[0m
test_ansi_color_bold_and_fgcolor
ValueError: invalid literal for int() with base 10: \x1b[1;31m5\x1b[1;m
Final note
I look forward to your reply or guide to start working on solutions. Otherwise, I could make code, but I really don't know if it aligns with where you want to go.
It would be the solution for the colored numbers, at least at Regular Expression level.
Regular expression: (\\x[0-9a-fA-F]+\[\d+m)([0-9\.]+)(\\.*)
For this example
\x1b[31m42\x1b[0m
\x1b[32m1010\x1b[0m
Match for these examples:
\x1b[31m 42 \x1b[0m
\x1b[32m 1010 \x1b[0m
In the above commit: https://github.com/astanin/python-tabulate/pull/319/commits/e45cac16676f0955b2fd05389d97b3e6f71fd66c; I added the validation and formatting when the number is colored.
Here other option to format the strings which contain integers. For example, in this data [[82642], ["1500"], [2463]]
the result should be 82,642 | 1,500 | 2,463
rather than 82,642 | 1500 | 2,463
.
The only change that needs to be performed is adding some elif
:
elif val.isdigit():
val = format(int(val), intfmt)
In this code:
if valtype is str:
return f"{val}"
elif valtype is int:
if isinstance(val, str):
val_striped = val.encode('unicode_escape').decode('utf-8')
colored = re.search(r'(\\[xX]+[0-9a-fA-F]+\[\d+[mM]+)([0-9.]+)(\\.*)$', val_striped)
if colored:
total_groups = len(colored.groups())
if total_groups == 3:
digits = colored.group(2)
if digits.isdigit():
val_new = colored.group(1) + format(int(digits), intfmt) + colored.group(3)
val = val_new.encode('utf-8').decode('unicode_escape')
elif val.isdigit():
val = format(int(val), intfmt)
intfmt = ""
return format(val, intfmt)
This is the result of the test test_intfmt_with_string_as_integer
.
result = tabulate([[82642], ["1500"], [2463]], intfmt=",", tablefmt="plain")
expected = "82,642\n 1500\n 2,463"
Expected:
82,642
1500
2,463
Got:
82,642
1,500
2,463
Note:
If you think this change is important and relevant to your package, please let me know, so I can add this change to the pull requests.