CSV parsing does not take into account non raw_strings (it fails if ',', '[' or ']' are inside quotes)
Hi !
I wanted to use glaze to parse the currencies from this CSV with this structure:
struct CurrencyCSV {
vector<string> Entity;
vector<string> Currency;
vector<string> AlphabeticCode;
vector<string> NumericCode;
vector<string> MinorUnit;
vector<string> WithdrawalDate;
};
However, the parser interprets , (and [ ]) if they are inside brackets in a value. For instance, these lines fail:
"MOLDOVA, REPUBLIC OF",Russian Ruble,RUR,810,,1993-12
"FALKLAND ISLANDS (THE) [MALVINAS]",Falkland Islands Pound,FKP,238,2,
I think that this behavior could be expected if .raw_string is true, but it should work if we set .raw_string to false.
WDYT ?
If I have the time, I will try to make a PR.
Actively working on this issue here: #1446
I think the string parsing has been fixed. It also supports escaped quotes per the CSV specification, which were needed to parse this document. However, there is a bug where it parses the WithdrawalDate into the MinorUnit. I have to work on some other stuff at the moment. If you want to try to fix this from the csv_currency branch, that would be great! Otherwise, I'll get back to this when I'm able.
I think the string parsing has been fixed. It also supports escaped quotes per the CSV specification, which were needed to parse this document. However, there is a bug where it parses the WithdrawalDate into the MinorUnit. I have to work on some other stuff at the moment. If you want to try to fix this from the
csv_concurrencybranch, that would be great! Otherwise, I'll get back to this when I'm able.
I think I found the bug, and fixed it in this commit. Feel free to take it into your branch. It's because we are skipping the trailing ',' in both the from<CSV for string_t and in the main loop of line 616, whereas for other type parsing we don't skip the commas, so I decided to just remove it from the string_t and it seems to work.
However, I did not see any test for rowwise parsing and I cannot make my commented test pass (see csv_test.cpp:606-607). Am I calling it incorrectly ?
Thanks, I got your fix on the branch so that it parses column wise correctly. I'll look at your commented test now.
I updated the rowwise test on the csv_currency branch. The first issue was that it was still reading in a column wise CSV file. Now it write out the data in rowwise format and then tries to read that back in. The outstanding issue is that our string writing for CSV does not escape quotes.
I'm going to merge the current fixes for column wise support. But, I'm going to keep this issue alive until rowwise support and escaped writing has been added. I just want to get these changes merged as a first step.
I could add you as a contributor to Glaze if you'd like to make branches directly in Glaze for these CSV fixes.
I updated the rowwise test on the
csv_currencybranch. The first issue was that it was still reading in a column wise CSV file. Now it write out the data in rowwise format and then tries to read that back in. The outstanding issue is that our string writing for CSV does not escape quotes.
oh, my bad! I thought it only changed the way the csv file was parsed internally, I did not know the existence of rowwise CSV actually! Anyway, it would be great to have an example in the README and in the unit test.
I could add you as a contributor to Glaze if you'd like to make branches directly in Glaze for these CSV fixes.
Thanks for the proposition, why not, I really enjoy glaze so if you need help for some items I would be glad to help when I have the time.
Cool, I sent you an invite. I find it easier to make pull requests and work from within a repository. No pressure to contribute, but any help is appreciated.
The CSV code needs some work, particularly with writing strings with escaped quotes and handling CSV files without headers.
Support was added in: #1786