Much slower than MD4C
Hello! Thanks for this library. I was wondering why for the same text I got such a difference performance:
Maddy took 5304 milliseconds Qt took 5 milliseconds
Maddy code:
std::stringstream markdownInput("some text...");
m_markdownParser->Parse(markdownInput);
Qt code:
QString markdownInput("some text...");
QTextDocument textDoc;
textDoc.setMarkdown(markdownInput);
textDoc.toHtml();
EDIT: By mistake I set it as a feature request.
When it comes to performance tests there are certain things that play into results, for example:
- Operating System
- currently running apps on the system (so any other running processes, that can slow down a test)
- How many times did you run the tests?
So currently it is difficult to know the exact reasons for your results.
Besides that maddy's regex way of doing things might slow down currently processing Markdown. In version 2 I plan to remove the usage of regex and go with another approach which hopefully will speed maddy up. (Which I - of course - will benchmark) But until then maddy might not be the fastest solution.
I'm working every now and then on version 2, but cannot commit yet to a release date due to RL and maddy being a side-project.
Of course - if somebody finds a way to speed things up a little in the meantime - I'm always happy for contributions.
Excuse my late reply. Here's a reproducible test with the first chapter of Moby Dick in Markdown: https://gist.github.com/nuttyartist/cb0053ccda823ac98a7ce58f296269cc
I got somewhat consistent results of the following: During Debug mode:
Maddy took 84380 milliseconds
MD4C took 0 milliseconds
During Release mode:
Maddy took 17552 milliseconds
MD4C took 0 milliseconds
EDIT: I edited the title after realizing Qt is using MD4C underneath.
I ran into the performance-issue too and for me that almost makes maddy unusable. After some profiling and testing I found that the culprits are the following parsers:
EMPHASIZED_PARSER ITALIC_PARSER STRIKETHROUGH_PARSER STRONG_PARSER
What they have in common is a long regexp that seems to take long to evaluate. I don't know if this breaks anything, but I replaced them with the following loops:
EmphasizedParser
void
Parse(std::string& line) override
{
std::string pattern = "_";
std::string newPattern = "em";
for (;;) {
int patlen = pattern.size();
auto pos1 = line.find(pattern);
if (pos1 == std::string::npos) {
break;
}
auto pos2 = line.find(pattern, pos1 + patlen);
if (pos2 == std::string::npos) {
break;
}
std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
}
}
ItalicParser
void
Parse(std::string& line) override
{
std::string pattern = "*";
std::string newPattern = "i";
for (;;) {
int patlen = pattern.size();
auto pos1 = line.find(pattern);
if (pos1 == std::string::npos) {
break;
}
auto pos2 = line.find(pattern, pos1 + patlen);
if (pos2 == std::string::npos) {
break;
}
std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
}
}
StrikeThroughParser
void
Parse(std::string& line) override
{
std::string pattern = "~~";
std::string newPattern = "s";
for (;;) {
int patlen = pattern.size();
auto pos1 = line.find(pattern);
if (pos1 == std::string::npos) {
break;
}
auto pos2 = line.find(pattern, pos1 + patlen);
if (pos2 == std::string::npos) {
break;
}
std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
}
}
StrongParser
void
Parse(std::string& line) override
{
std::string pattern = "**";
std::string newPattern = "strong";
for (;;) {
int patlen = pattern.size();
auto pos1 = line.find(pattern);
if (pos1 == std::string::npos) {
break;
}
auto pos2 = line.find(pattern, pos1 + patlen);
if (pos2 == std::string::npos) {
break;
}
std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
}
pattern = "__";
for (;;) {
int patlen = pattern.size();
auto pos1 = line.find(pattern);
if (pos1 == std::string::npos) {
break;
}
auto pos2 = line.find(pattern, pos1 + patlen);
if (pos2 == std::string::npos) {
break;
}
std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
}
}
I didn't measure how much faster this is, but my application went from being very laggy when parsing markdown-files to no lag that I can notice at all.
This is just a quick fix and I don't have time at the moment to clean it up and test it more, otherwise I would make a pull request. Just sharing it hoping that it is useful.
Thank you @vedderb, this is awesome, multiple orders of magnitude faster. Do you have a fork with these changes?