FreeDiscovery icon indicating copy to clipboard operation
FreeDiscovery copied to clipboard

_LSIWrapper: Access to ds_p outside of fit_transform

Open robguinness opened this issue 7 years ago • 2 comments

Hi,

Would you consider a pull request that allowed access to the ds_p field of fit_transform() as a member variable of _LSIWrapper? I have a use case where I want to do further processing of the transformed data. I realize I could access this using _load_features, but this means loading it again from disk, which seems like a waste.

This will, of course, increase the memory consumption of _LSIWrapper for other cases where the transformed data is not needed. Another option is to have fit_transform() return ds_p. This would fit more into the pattern of scikit-learn for fit_transform().

robguinness avatar Oct 05 '18 12:10 robguinness

Thanks for opening this issue @robguinness

This will, of course, increase the memory consumption of _LSIWrapper for other cases where the transformed data is not needed. Another option is to have fit_transform() return ds_p. This would fit more into the pattern of scikit-learn for fit_transform().

Yes, because datasets can be quite large, including the LSI transformed data, I think it's better to avoid storing it as attribute unless we absolutely need to.

Returning ds_p in fit_transform() sounds like a good idea indeed, that way you could store it as needed for your applications. PR would be very welcome!

rth avatar Oct 11 '18 18:10 rth

Thanks. I'll try to write a PR soon. I'm on another project at the moment, but I hope to get back to this soon.

robguinness avatar Oct 17 '18 08:10 robguinness