healthcareai-py icon indicating copy to clipboard operation
healthcareai-py copied to clipboard

Warn users for category levels/factors with infrequent usage

Open Aylr opened this issue 7 years ago • 2 comments

Example R Output

Warning messages:
1: In private$loadData() :
  Each of the following categorical variables has levels that occur 3 times or fewer:
-  MaritalStatusDSC : 3 levels
-  ReligionDSC : 23 levels
-  LanguageDSC : 37 levels
-  RaceGroupNM : 1 levels
Consider grouping these together with other levels.
You can view the levels of a column using the "table" command.

2: In private$loadData() :
  The following categorical variable levels were not used in training the model:
-  ReligionDSC : c("BU", "SD", "SOUTHERN BAPTIST")
-  LanguageDSC : c("CATALAN", "CROATIAN", "IGBO", "SERBIAN", "SERBO-CROATIAN", "UNKN")

Aylr avatar Oct 31 '17 16:10 Aylr

This sort of relates to #384 and I wonder if there is some code-sharing potential there

Aylr avatar Jan 22 '18 16:01 Aylr

@Aylr Hi, I'm new to the project and looking for a good first issue to work on. Is this issue available?

jeremykohn avatar Apr 30 '19 06:04 jeremykohn