RDT
RDT copied to clipboard
Add attribute to transformers to check if they should be tested for quality
Problem Description
By definition, some transformers don't do anything to help expose or preserve relationships in the data (eg. the LabelEncodingTransformer. This means they are very likely to fail quality tests.
As of now, the thresholds for quality tests have been lowered to account for this, but what would be better is if we could just skip any transformers that are known not to be good for quality in the tests.
Proposal
- Add an attribute and method to each transformer to determine if it is designed to be good for quality or not.
- Raise the thresholds in the
tests/quality/test_quality.pyfile.
I agree this should be revisited at some point, I just had to lower the threshold even further to allow the CategoricalTransformer to pass.