FreeDiscovery
FreeDiscovery copied to clipboard
_LSIWrapper: Access to ds_p outside of fit_transform
Hi,
Would you consider a pull request that allowed access to the ds_p field of fit_transform() as a member variable of _LSIWrapper? I have a use case where I want to do further processing of the transformed data. I realize I could access this using _load_features, but this means loading it again from disk, which seems like a waste.
This will, of course, increase the memory consumption of _LSIWrapper for other cases where the transformed data is not needed. Another option is to have fit_transform() return ds_p. This would fit more into the pattern of scikit-learn for fit_transform().
Thanks for opening this issue @robguinness
This will, of course, increase the memory consumption of _LSIWrapper for other cases where the transformed data is not needed. Another option is to have fit_transform() return ds_p. This would fit more into the pattern of scikit-learn for fit_transform().
Yes, because datasets can be quite large, including the LSI transformed data, I think it's better to avoid storing it as attribute unless we absolutely need to.
Returning ds_p in fit_transform() sounds like a good idea indeed, that way you could store it as needed for your applications. PR would be very welcome!
Thanks. I'll try to write a PR soon. I'm on another project at the moment, but I hope to get back to this soon.