StyleFrame icon indicating copy to clipboard operation
StyleFrame copied to clipboard

Auto column width and row height with multi-line cell values

Open nachtkinder opened this issue 4 years ago • 7 comments

Hi,

I'm having an issue with the auto determined column width and row height when using StyleFrame with a pandas DataFrame that contains multi-line values.

import pandas as pd
from styleframe import StyleFrame

df = pd.DataFrame({"a": ("Line 1\nLine 2\nLine 3",)})
sf = StyleFrame(df)
sf.to_excel("test.xlsx", best_fit=list(df.columns), index=False).save()

What I expect test.xlsx to look like is:

   a   |  b
-------|------
Line 1 |Line 1
Line 2 |Line 2
Line 3 |Line 3
-------|------

What it actually looks like is:

                   a                   |                   b
---------------------------------------|--------------------------------------
                 Line 2                |                 Line 2
---------------------------------------|--------------------------------------

"Line 1" and "Line 3" are present in each cell but the row is not rendered with sufficient height for them to be visible.

This behaviour is the same regardless of any wrap_text or width properties set on the columns.

The README.md states:

The StyleFrame object will auto-adjust the columns width and the rows height but they can be changed manually

The API documentation for StyleFrame.to_excel states the formula used to determine a column's width as:

(len(longest_value_in_column) + A_FACTOR) * P_FACTOR

...which results in column widths a little bigger than the largest combined length of all lines plus the newline characters between them in a single value.

Is there a way to instruct StyleFrame to treat multi-line value differently such that:

a. Columns that contain multi-line values are not much wider than the longest line in a value contained within them

and

b. Rows that contain multi-line values have sufficient height to display all lines of any cell contained within

Versions

Python 3.7.5 StyleFrame 3.0.5 pandas 1.0.4 openpyxl 2.6.3

Described behaviour also present with pandas 1.1.3 and openpyxl 3.0.5

nachtkinder avatar Oct 16 '20 15:10 nachtkinder

Hi.

a. Columns that contain multi-line values are not much wider than the longest line in a value contained within them

This is quite easily fixable, see the commit in which I referenced this issue. This will split each row in each column on \n and use the length of the longest line in each row as the length for the calculation.

b. Rows that contain multi-line values have sufficient height to display all lines of any cell contained within

This is a bit tricker because I suspect this is going to be more font-size sensitive.

For now, you can use the following workaround before calling .to_excel:

df = pd.DataFrame({"a": ("Line 1\nLine 2\nLine 3",)})
sf = StyleFrame(df)
sf.set_column_width_dict({column: (max(map(len, sf.data_df[column].astype(str).str.split('\n'))) + sf.A_FACTOR) * sf.P_FACTOR
                          for column in sf.data_df.columns})
sf.set_row_height_dict({index + 2: (max(sf.data_df[column].astype(str).str.count('\n') for column in sf.data_df.columns) + sf.A_FACTOR) * sf.P_FACTOR * 2
                        for index, _ in enumerate(sf.data_df.index)})
sf.to_excel("test.xlsx", best_fit=list(df.columns), index=False).save()

DeepSpace2 avatar Oct 16 '20 17:10 DeepSpace2

I don't think that set_row_height_dict works. Pandas doesn't like applying max to Series like that now. Also something about that arithmetic seems off to me.

fiendish avatar Jun 24 '21 20:06 fiendish

I have no idea what A_FACTOR and P_FACTOR are supposed to be, but I think rather for height you would want to multiply the number of lines (count("\n")+1) by a padded character height. Width ought to be similar as well counting line characters instead and also account for the column header widths. I don't see how the posted code could accomplish either of those.

fiendish avatar Jun 24 '21 20:06 fiendish

@fiendish

multiply the number of lines (count("\n")+1) by a padded character height

That's essentially what A_FACTOR and P_FACTOR are:

best_fit will attempt to calculate the correct column-width based on the longest value in each provided column. However this isn’t guaranteed to work for all fonts (works best with monospaced fonts). The formula used to calculate a column’s width is equivalent to (len(longest_value_in_column) + A_FACTOR) * P_FACTOR. The default values for A_FACTOR and P_FACTOR are 13 and 1.3 respectively, and can be modified before calling StyleFrame.to_excel by directly modifying StyleFrame.A_FACTOR and StyleFrame.P_FACTOR.

from the docs (although this refers to their usage when calculating columns' width, the same applies when using their values in the suggestion solution).

The above suggested solution does what it says on the tin :)

This

df = pd.DataFrame({"a": ("Line 1\nLine 2\nLine 3",)})
sf = StyleFrame(df)
sf.to_excel("test.xlsx", best_fit=list(df.columns), index=False).save()

generates

image

however

df = pd.DataFrame({"a": ("Line 1\nLine 2\nLine 3",)})
sf = StyleFrame(df)
sf.set_row_height_dict({index + 2: (max(sf.data_df[column].astype(str).str.count('\n').max() for column in sf.data_df.columns) + sf.A_FACTOR) * sf.P_FACTOR * 2
                        for index, _ in enumerate(sf.data_df.index)})
sf.to_excel("test.xlsx", best_fit=list(df.columns), index=False).save()

generates

image

DeepSpace2 avatar Jun 24 '21 22:06 DeepSpace2

Hi I got an error when processing my data. I guess when the table has multiple rows, this code may not work. ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Maybe it is because the max operation in sf.set_row_height_dict() line. But I don't know.

Another problem is in apply_style_by_indexes, when no grid or index satisfy the provided condition, there is a key error. KeyError: "None of [Index([.......dtype='object')] are in the [columns]"

coolbeam avatar Sep 21 '21 16:09 coolbeam

@coolbeam Indeed, sf.data_df[column].astype(str).str.count('\n') should be sf.data_df[column].astype(str).str.count('\n').max(). I will edit the original comment.

As for your second point, I think that should be a separate issue as I'm not sure this is related to this. Anyway, please provide a reproducible example as I'm not sure I follow.

DeepSpace2 avatar Sep 21 '21 19:09 DeepSpace2

Maybe related #131

buhtz avatar Jul 14 '22 07:07 buhtz