rapidyaml
rapidyaml copied to clipboard
Slow parsing of very long flow-style lines
When parsing a large 34MB file, it seems stuck in an infinite loop (waited 5 minutes). This file is almost the same as the one from #288 except that it has 500 flow style sequences of 10 000 floats each instead of a block style. The block style file takes less then a second to load... I'm using 17 precision floats (which are doubles) and used the ryml::_WIP_STYLE_FLOW_SL
for generating the file.
Step to reproduce:
std::string loadFileToString(const std::string& path)
{
std::ifstream ifs(path.c_str(), std::ios::in | std::ios::binary | std::ios::ate);
std::ifstream::pos_type size = ifs.tellg();
ifs.seekg(0, std::ios::beg);
std::vector<char> bytes(size);
ifs.read(bytes.data(), size);
std::cout << "Read bytes finished" << std::endl;
return std::string(bytes.data(), size);
}
int main()
{
std::string path = "your_path";
std::string buf = loadFileToString(path);
ryml::Tree t;
ryml::NodeRef n = t.rootref();
// not tested with parse_in_place
ryml::parse_in_arena("flow.yaml", ryml::to_csubstr(buf), n);
}
Wow, this one reveals something that was overlooked. Thanks for reporting.
There is an implicit assumption throughout the code that a YAML line is smallish. Because of subtle token-scanning intrincacies (eg searching for #
before searching for ,
), in some places (not many, but some), there will be a scan to the end of the line, which will cause quadratic complexity for single-line flow-style containers.
For example, here. The check for ,
should be done before this, and is being done after.
I will have to spend some time on this.
In the meantime, until there is a fix addressing places where this assumption was instilled, you will improve your parse speed substantially if you stick to block style in these cases, or if you break the lines (which I know may be a problem if you're using ryml to create these files as IIRC it still has no facility to break the lines when emitting).