evalml
evalml copied to clipboard
Consider adding warning or error if users partial dependence fast mode in unrecommended ways
Partial dependence fast mode was initially added with some limitations, namely that it will not produce correct results if any of the components in the pipeline relies on multiple columns when making transformations. Because of this, it is also not recommended that pipelines containing user-created custom components be used with fast mode.
There is nothing in place to actually stop this, though. We should consider adding a warning or error to let users know when they are using fast mode inadvisably. Here are several ideas for how to do this:
- Check whether all the components in the pipeline are native to evalml - raise a warning or error if they aren't
- Add a test that checks every transforming component for whether or not it produces the same results with a single column as with the whole X a. This will be difficult because I don't know if we have a great way of automating setting the inputs to components such that they will be useful for detecting if a component is relying on multiple columns. For example: You couldn't get this information by passing a single numeric column into the one-hot encoder since it will do nothing to the numeric column.
- Add a check within fast mode itself for whether the specified feature produces the same results in the pipeline by itself as in the whole dataset.
a. I think this is very doable - we would get the transformed X_t for the original pipeline and then use the cloned pipeline for that feature to transform all but final of
X_t[feature]
and confirm that the outputs are exactly the same. b. This will add additional overhead to fast mode that will cause a (slight hopefully) decrease in computational performance. We should do performance testing to make sure this isn't much worse.