ckanext-datastorer
ckanext-datastorer copied to clipboard
Add a configurable sample size (window) for tables
As can be easily observed (e.g. https://github.com/okfn/messytables/blob/master/messytables/commas.py#L123), messytables has a default size of 1000 for the sample of rows used for guessing headers and column types.
This may be not adequate for certain cases (e.g too many rows, values for a column not evenly distributed etc), so i added a simple config option ckanext.datastorer.sample_size
to explicitly set the size of the sample. Since all processing happens in the background, i think is an acceptable cost to wait a bit longer in order have more reliable guesses (if you provide a bigger sample size),
+1 very helpful pull