FEDOT icon indicating copy to clipboard operation
FEDOT copied to clipboard

Improvement of automatic text detection

Open andreygetmanov opened this issue 2 years ago • 1 comments

Automatic text detection now is more effective, accurate and robust.

  1. Every column which possibly contains text now is checked on the tf-idf vocabulary size. If this size is more than threshold, this column really contains useful text information.
  2. Columns with links (they don't contain useful information and sometimes lead to a FEDOT fail) are removed automatically.
  3. Additional unit tests and extended tests (based on AutoML benchmark) will be added too.

andreygetmanov avatar Sep 21 '22 13:09 andreygetmanov

Codecov Report

Merging #903 (86ae96f) into master (c5f050d) will decrease coverage by 0.08%. The diff coverage is 86.95%.

@@            Coverage Diff             @@
##           master     #903      +/-   ##
==========================================
- Coverage   87.86%   87.77%   -0.09%     
==========================================
  Files         206      206              
  Lines       13786    13726      -60     
==========================================
- Hits        12113    12048      -65     
- Misses       1673     1678       +5     
Impacted Files Coverage Δ
fedot/core/pipelines/tuning/search_space.py 100.00% <ø> (ø)
...implementations/data_operations/text_pretrained.py 56.14% <42.85%> (-4.73%) :arrow_down:
fedot/core/data/data_detection.py 96.70% <97.22%> (+1.31%) :arrow_up:
fedot/core/composer/metrics.py 97.22% <100.00%> (-0.04%) :arrow_down:
fedot/core/constants.py 100.00% <100.00%> (ø)
fedot/core/data/data.py 86.77% <100.00%> (-0.11%) :arrow_down:
fedot/core/data/multi_modal.py 87.62% <100.00%> (+2.05%) :arrow_up:
fedot/preprocessing/data_types.py 94.25% <100.00%> (+0.03%) :arrow_up:
...edot/core/repository/graph_operation_repository.py 66.66% <0.00%> (-8.34%) :arrow_down:
fedot/explainability/explainer_template.py 75.00% <0.00%> (-5.00%) :arrow_down:
... and 59 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar Sep 21 '22 14:09 codecov[bot]