CleverCSV icon indicating copy to clipboard operation
CleverCSV copied to clipboard

Header Detection Improvement

Open ben-bitdotio opened this issue 2 years ago • 1 comments

Summary: Added resolution between float and int types so they aren't recognized as incompatible.

Tests: Verified that the following file is correctly predicted to have a header via Detector.has_header().

col1,col2,col3
hello,"hello world", 1.2
world,"hello world", 1.2
test,"hello world 您", 1

Update: I will be unable to contribute to this discussion under this account after today. It appears that I'm unable to modify the assignees list but @ellie-bitio should be able to follow up if necessary.

ben-bitdotio avatar Aug 17 '21 20:08 ben-bitdotio

Thanks for opening an issue on this and creating a PR @ben-bitdotio! The header detection code could definitely be improved, but I've been waiting until I have a dataset to evaluate the accuracy of different algorithms. This fix seems pretty harmless though, so I think we can merge it for now.

Would you be able to add a unit test to tests/test_unit/test_detect.py that fails without your fix but passes with your fix? That would be a nice confirmation that it works as expected (the example you give above could work as a test case). Thank you!

(cc-ing @ellie-bitio as suggested)

GjjvdBurg avatar Aug 20 '21 15:08 GjjvdBurg