mljar-supervised
mljar-supervised copied to clipboard
Report generation fails when training variables contain chinese charecters
While trying to generate the model training report on a dataset that has chinese charecters, below mentioned error is encountered.
File "C:\Users\kar11081\AppData\Local\test\conda\envs\dl-2022-04-29-2\Lib\site-packages\supervised\automl.py", line 415, in report
return self._report(width, height)
File "C:\Users\kar11081\AppData\Local\test\conda\envs\dl-2022-04-29-2\Lib\site-packages\supervised\base_automl.py", line 2180, in _report
+ self._md_to_html(
File "C:\Users\kar11081\AppData\Local\test\conda\envs\dl-2022-04-29-2\Lib\site-packages\supervised\base_automl.py", line 2085, in _md_to_html
content = fin.read()
File "C:\Users\kar11081\AppData\Local\test\conda\envs\dl-2022-04-29-2\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 438: character maps to <undefined>
and a lot of warning messages similar to this:
Glyph 20061 missing from current font.
matplotlib\backends\backend_agg.py:240: RuntimeWarning:
Glyph 32922 missing from current font.
@KarthikDutt thank you for reporting the issue. Could you pleaseprovide the code example that crashes report generation? I will do my best to fix it asap.
@pplonski Thanks for your response. Here's a code snippet that can help reproduce the error.
from supervised.automl import AutoML as base_AutoML
import pandas as pd
data = pd.read_csv(r'data_for_issue_recreation.csv')
model_obj = base_AutoML(ml_task='multiclass_classification')
model_obj.fit(data, data['HAD_VIL_CNAME'])
model_obj.report()
The data can be accessed from here