ydata-profiling icon indicating copy to clipboard operation
ydata-profiling copied to clipboard

Take "ordered" into account for columns declared as Categorical

Open kochuyt opened this issue 5 years ago • 1 comments

Background of request

In statistics a distinction is made between categorical data which has no inherent order (aka nominal scales) and categorical data which does have order (aka ordinal scales). Among other places, this distinction is used when creating histograms of the frequency of occurrence of categories.

  • For nominal data, the categories on the x-axis are typically ordered pareto style, with the most frequent occurring category coming first
  • For ordinal data, the categories on the x-axis are ordered in line with the defined order of the categories, as this enables to visually explore skewness etc... of the occurrence frequencies

Although pandas has just one dtype category, it is able to make the 'nominal' versus 'ordinal' distinction through the use of the ordered property of this dtype.

As far as I can see pandas-profiling is taking into account the dtype of a column (variable). Indeed, for variables of dtype category the textual descriptions of the categories, and not the underlying codes, are displayed in the profiling report.

But, as far as I can see, the ordered property is not taken into account for such variables. Occurrence frequency histograms always use pareto style ordering of categories, hence making it difficult to quickly see if the category occurrences are distributed as expected (or not).

Proposed feature

For variables with dtype category and having ordered=True, order categories in the frequency histogram in line with the ordering of the categories as it was explcitly defined.

To be clear: for variables which are not declared explicitly as dtype category the current approach should not change.

Many thanks in advance for considering this request

kochuyt avatar Oct 01 '20 17:10 kochuyt

With visions integrated, this should be easy to support. Anyone interested is invited to pick this up!

sbrugman avatar Mar 28 '21 11:03 sbrugman