Gertjan van den Burg
Gertjan van den Burg
Thanks for opening an issue on this and creating a PR @ben-bitdotio! The header detection code could definitely be improved, but I've been waiting until I have a dataset to...
Hi @RahulSinghYYC, thanks for your question. This depends a bit on whether you're reading the file as a list of list or as a dataframe. If you're using the ``read_table``...
Hi @RahulSinghYYC, CleverCSV doesn't currently have support for detecting the table area automatically. I know there is some research on this problem (see, e.g. [hypoparsr](https://github.com/tdoehmen/hypoparsr) and [Pytheas](https://github.com/cchristodoulaki/Pytheas/)), but there are...
Thanks for offering a suggestion @lcnittl, very nice of you to help! :+1: Just to offer another work-around: one of the main approaches that CleverCSV takes in detecting the dialect...
Hi @jlumbroso, thanks for the detailed bug report! You're describing an issue that I've been thinking about for a while, but never had the time to seriously investigate, so I'm...
Hi @jlumbroso, This took a bit longer than expected, but I've now added a comparison study to the repo (see [here](https://github.com/alan-turing-institute/CleverCSV/tree/comparison/comparison)). This experiment evaluates the accuracy and runtime of dialect...
Hi @jlumbroso, Thanks for your response, needless to say I'm not that fast with responding myself either, for which I apologize. That doesn't mean however that I haven't thought about...
For completeness, when I run ``clevercsv_grow`` with the fix proposed in the previous message (i.e., ``current_line_count = min(max_line_count, current_line_count * 2)``), there is no difference in accuracy up to 1000...
Hi @tooptoop4, thanks for reporting this issue. Are you able to share the file? I may be able to fix the problem
Results can be a bit noisy if you don't read the entire file. Theoretically, it might be the case that the "key" to reading the file correctly is only the...