Using `data` for providing new data (as df) in `print()` or `plot()` yields errors
In previous versions of FFTrees, it was possible to supply data frames as data when plotting (and perhaps printing) FFTrees objects. Thus, the data argument in print() and plot() was serving multiple functions by either being used to switch between existing "train" vs. "test" data or, alternatively, accept a new data frame (of test data).
In the latter case, the new df was used to apply an existing FFTrees object to the new test data.
This currently fails (as data is not handled properly for dfs).
Details
We could fix this, or course, but I hesitate to restore the previous functionality, for the following reasons:
- Co-opting
datato accept both a simple string and an entire df is a bit ugly.
More precisely, what remains confusing and unclear is:
- Does an implicit application of new data (due to taking place within a function call) replace the previous test data in the
FFTreesobject?
Actually, fftrees_apply() returns a modified FFTrees object. However, as long as this result is not re-assigned to the original source object, the change in the test dataset and its performance details do not appear in the original object. Thus, the user would print or plot results that are not stored in the corresponding FFTrees object.
Suggestion
- Keep the
dataargument simple (a string that must be either "train" or "test"). - Add a
newdataargument (as inpredict()andfftrees_apply()) to provide new test data (as df) and pass it tofftrees_apply(). - Add a note when using this function that the original
FFTreesobject is not changed unless it is being re-assigned.
PR #95 has addressed the bug, but still uses the data argument for both string and data frame inputs.
Additionally, the issue of locally vs. globally changing an FFTrees object x is not resolved. Presently, the global object x remains unchanged when plotting or printing x for new data (as a data frame).