Gertjan van den Burg

Results 48 comments of Gertjan van den Burg

Thanks for opening an issue on this and creating a PR @ben-bitdotio! The header detection code could definitely be improved, but I've been waiting until I have a dataset to...

Hi @RahulSinghYYC, thanks for your question. This depends a bit on whether you're reading the file as a list of list or as a dataframe. If you're using the ``read_table``...

Hi @RahulSinghYYC, CleverCSV doesn't currently have support for detecting the table area automatically. I know there is some research on this problem (see, e.g. [hypoparsr](https://github.com/tdoehmen/hypoparsr) and [Pytheas](https://github.com/cchristodoulaki/Pytheas/)), but there are...

Thanks for offering a suggestion @lcnittl, very nice of you to help! :+1: Just to offer another work-around: one of the main approaches that CleverCSV takes in detecting the dialect...

Hi @jlumbroso, thanks for the detailed bug report! You're describing an issue that I've been thinking about for a while, but never had the time to seriously investigate, so I'm...

Hi @jlumbroso, This took a bit longer than expected, but I've now added a comparison study to the repo (see [here](https://github.com/alan-turing-institute/CleverCSV/tree/comparison/comparison)). This experiment evaluates the accuracy and runtime of dialect...

Hi @jlumbroso, Thanks for your response, needless to say I'm not that fast with responding myself either, for which I apologize. That doesn't mean however that I haven't thought about...

For completeness, when I run ``clevercsv_grow`` with the fix proposed in the previous message (i.e., ``current_line_count = min(max_line_count, current_line_count * 2)``), there is no difference in accuracy up to 1000...

Hi @tooptoop4, thanks for reporting this issue. Are you able to share the file? I may be able to fix the problem

Results can be a bit noisy if you don't read the entire file. Theoretically, it might be the case that the "key" to reading the file correctly is only the...