DataCleaner
DataCleaner copied to clipboard
Don't load all reference data values into memory
Currently. the reference data is always loaded into memory, needed or not. This could be an issue for some users with big/huge dictionaries.
DictionaryConnection is already returning an iterator, so it should be a simple matter of making a custom iterator implementation. However for SynonymCatalogConnection, we'll probably need to create a new method (and deprecate the old one), to do the same, then move all our own usages over.
A word of caution for this item: It was previously built so that they would not load into memory. And the consequence was that they where waaay too slow. So there is a definate trade-off here.
If anything, I'd suggest adding a configuration flag in the reference data's configuration. My impression is that huge reference data sets are rare and that small/medium size reference data (say, less than 10,000 entries) is much more common.