python-tabulate icon indicating copy to clipboard operation
python-tabulate copied to clipboard

Discussion: data types of the columns

Open airvzxf opened this issue 10 months ago • 0 comments

Discussion: data types of the columns

Discussion

My question is whether there should be a mixed-type column instead of automatically deciding what data type the column is. In this case, “tabulate” would try to identify the data type for each cell in that column and treat it as such rather than the overall type of the column.

Quick comment

In GitHub, you could add the discussion feature: https://github.com/features/discussions. With this feature, your community can create a discussion, if a specific discussion is relevant, it could move to the issues section.

Not all the final users (community) use the discussion feature, instead the repositories or projects have enabled. But, the discussion feature appears to be useful in terms of administration.

Context

I noticed that “tabulate” reviews all the rows for each column and automatically assigns a type of column. It is fabulous, but I am concerned when the rows in the column are mixed.

In the mixed cases, I discovered that the order is as follows:

  • If one row in the column contains one or more strings, it is considered a string.
  • Otherwise, if at least one float is detected, the column will be a float type.
  • Finally, if the column contains only integers, the type is the same.

Evidence

All the results were taken, adding debug lines for the function _format(). It prints the val type and the valtype value to compare both.

def _format(val, valtype, floatfmt, intfmt, missingval="", has_invisible=True):
    print(f'    val: {val}')
    print(f'   type: {type(val)}')
    print(f'valtype: {valtype}')
    print()

For this instruction: tabulate([[82000.38], ["abcd"], [92165]], tablefmt="plain") the valtype is <class 'str'>.

The result is below.

    val: 82000.38
   type: <class 'float'>
valtype: <class 'str'>

    val: abcd
   type: <class 'str'>
valtype: <class 'str'>

    val: 92165
   type: <class 'int'>
valtype: <class 'str'>

For this instruction: tabulate([[12013], [210], [15.24], [92165]], tablefmt="plain") the valtype is <class 'float'>.

The result is below.

    val: 12013
   type: <class 'int'>
valtype: <class 'float'>

    val: 210
   type: <class 'int'>
valtype: <class 'float'>

    val: 15.24
   type: <class 'float'>
valtype: <class 'float'>

    val: 92165
   type: <class 'int'>
valtype: <class 'float'>

For this instruction: tabulate([[12013], [210], [92165]], tablefmt="plain") the valtype is <class 'int'>.

The result is below.

    val: 12013
   type: <class 'int'>
valtype: <class 'int'>

    val: 210
   type: <class 'int'>
valtype: <class 'int'>

    val: 92165
   type: <class 'int'>
valtype: <class 'int'>

Expectation

Based on this discussion, I expected this output for the _format() function.

Solution 1

For this instruction: tabulate([[82000.38], ["abcd"], [92165]], tablefmt="plain") the valtype should be Mixed or something like this.

The result is below.

    val: 82000.38
   type: <class 'float'>
valtype: <class 'Mixed'>

    val: abcd
   type: <class 'str'>
valtype: <class 'Mixed'>

    val: 92165
   type: <class 'int'>
valtype: <class 'Mixed'>

Then, in the logic for the _format() function, we can check that it is mixed and take the real value for the val to perform all the actions for formatting.

Solution 2

Always ignore the valtype and take the type of the val. Except if some parameter was passed which indicates that the user specified the format of the column. Something like this: tabulate([[82000.38], ["abcd"], [92165]], coltypes=(int), tablefmt="plain"); which will treat all the cells in the column as integers.

Final note

I arrived to this package because I was using the Pandas package, specific to the function “to_markdown”. Maybe, could be a good idea to add the Pandas people to see this discussion and have additional feedback.

By the way, Pandas wraps a limited version of tabulate for the function to_markdown. Outside this discussion, it could be nice to Pandas wrap the full parameters and functionality of tabulate.

airvzxf avatar Apr 14 '24 21:04 airvzxf