ckanext-datastorer icon indicating copy to clipboard operation
ckanext-datastorer copied to clipboard

Add a configurable sample size (window) for tables

Open drmalex07 opened this issue 9 years ago • 1 comments

As can be easily observed (e.g. https://github.com/okfn/messytables/blob/master/messytables/commas.py#L123), messytables has a default size of 1000 for the sample of rows used for guessing headers and column types.

This may be not adequate for certain cases (e.g too many rows, values for a column not evenly distributed etc), so i added a simple config option ckanext.datastorer.sample_size to explicitly set the size of the sample. Since all processing happens in the background, i think is an acceptable cost to wait a bit longer in order have more reliable guesses (if you provide a bigger sample size),

drmalex07 avatar Jun 24 '15 20:06 drmalex07

+1 very helpful pull

moqri avatar Oct 04 '16 19:10 moqri