redcarpet
redcarpet copied to clipboard
Assertion failed: `(md->work_bufs[BUFFER_SPAN].size == 0)` - Thread safety related?
First off, thanks for an amazing gem!
I did come across a glitch... I think.
I was testing the rendering system under high stress.
I ran the system in multi threaded mode (running on 16 threads).
While testing, I disabled my own cache management to stress test the rendering engine (rather then the caching system which was storing the rendered string).
I repeatedly asked for the same file to be rendered. 12 threads were doing the asking, 16 threads were doing the rendering, it was all running through Iodine's async IO using HTTP requests, about 200 concurrent requests at a time.
Every time I ran the test, my server would crash with the error:
Assertion failed: (md->work_bufs[BUFFER_SPAN].size == 0), function sd_markdown_render, file markdown.c, line 2898
I'm not complaining about the server crashing, I think it's the right thing to do when experiencing data corruption, but I am worried about the underline cause and the risk of production time failures.
I'm assuming this is multi-threading related, since I didn't get the error in single thread mode.
The following is just my guess (my 2¢) and I hope this helps track down the issue:
At first look, the C code doesn't seem to exit the Global VM Lock, so I thought I might be wrong, however, after a quick read through the code (that's a lot of code and very nicely written, wow), I think the issue is with the markdown struct storing stateful rendering data.
I believe that md->refs
and md->work_bufs
are susceptible to corruption. It also seems to me that these could be separated from the markdown engine into a separate "state struct", so that multi-threaded rendering with the same markdown render engine wouldn't cause data corruption.
I'm thankful for your time and hope for a fix or a workaround.
We were able to reproduce this error with this spec:
it 'renders wonky content' do
str = %Q[---------------------------------------------------------------------------\nTypeError Traceback (most recent call last)\nCell In[1], line 8\n 5 img = Image.open(\"/mnt/data/file-oT44n0LEldQZ2ymgGHzrwz8Q\")\n 7 # Use pytesseract to do OCR on the image\n----> 8 text = pytesseract.image_to_string(img)\n 10 text\n\nFile ~/.local/lib/python3.11/site-packages/pytesseract/pytesseract.py:413, in image_to_string(image, lang, config, nice, output_type, timeout)\n 408 \"\"\"\n 409 Returns the result of a Tesseract OCR run on the provided image to string\n 410 \"\"\"\n 411 args = [image, 'txt', lang, config, nice, timeout]\n--> 413 return {\n 414 Output.BYTES: lambda: run_and_get_output(*(args + [True])),\n 415 Output.DICT: lambda: {'text': run_and_get_output(*args)},\n 416 Output.STRING: lambda: run_and_get_output(*args),\n 417 }[output_type]()\n\nFile ~/.local/lib/python3.11/site-packages/pytesseract/pytesseract.py:416, in image_to_string.<locals>.<lambda>()\n 408 \"\"\"\n 409 Returns the result of a Tesseract OCR run on the provided image to string\n 410 \"\"\"\n 411 args = [image, 'txt', lang, config, nice, timeout]\n 413 return {\n 414 Output.BYTES: lambda: run_and_get_output(*(args + [True])),\n 415 Output.DICT: lambda: {'text': run_and_get_output(*args)},\n--> 416 Output.STRING: lambda: run_and_get_output(*args),\n 417 }[output_type]()\n\nFile ~/.local/lib/python3.11/site-packages/pytesseract/pytesseract.py:273, in run_and_get_output(image, extension, lang, config, nice, timeout, return_bytes)\n 263 def run_and_get_output(\n 264 image,\n 265 extension='',\n (...)\n 270 return_bytes=False,\n 271 ):\n--> 273 with save(image) as (temp_name, input_filename):\n 274 kwargs = {\n 275 'input_filename': input_filename,\n 276 'output_filename_base': temp_name,\n (...)\n 281 'timeout': timeout,\n 282 }\n 284 run_tesseract(**kwargs)\n\nFile /usr/local/lib/python3.11/contextlib.py:137, in _GeneratorContextManager.__enter__(self)\n 135 del self.args, self.kwds, self.func\n 136 try:\n--> 137 return next(self.gen)\n 138 except StopIteration:\n 139 raise RuntimeError(\"generator didn't yield\") from None\n\nFile ~/.local/lib/python3.11/site-packages/pytesseract/pytesseract.py:194, in save(image)\n 192 yield f.name, realpath(normpath(normcase(image)))\n 193 return\n--> 194 image, extension = prepare(image)\n 195 input_file_name = f.name + extsep + extension\n 196 image.save(input_file_name, format=image.format)\n\nFile ~/.local/lib/python3.11/site-packages/pytesseract/pytesseract.py:175, in prepare(image)\n 173 extension = 'PNG' if not image.format else image.format\n 174 if extension not in SUPPORTED_FORMATS:\n--> 175 raise TypeError('Unsupported image format/type')\n 177 if 'A' in image.getbands():\n 178 # discard and replace the alpha channel with white background\n 179 background = Image.new(RGB_MODE, image.size, (255, 255, 255))\n\nTypeError: Unsupported image format/type\n]
expect { helper.render_markdown(str) }.not_to raise_error
end
This spec itself passes, but the next helper.render_markdown
call will crash. It looks like this text messes up with some Redcarpet internal state. We were caching the Redcarpet renderer in a class constant and tried moving it to Thread.current
, and using a mutex. None worked.
IMHO, looking at the new data, this issue may be caused by a belated non-zero memory failure rather than a race condition.
This often happens when tests run for the first time and either the stack memory or the malloc
memory is all zeros (the initial state offered by the OS, especially in containers). In such cases a whole family of bugs can't be detected, such as variable initialization bugs.
Later on, as malloc
starts to return junk data (which usually happens only after the free list starts to get utilized), or as the stack unfolds and the next cycle uses "dirty" memory, non-initialization issues and str*
issues (i.e., strlen
) start to raise their heads as weird bugs show up.
Looking a bit at the code, could be related to this:
https://github.com/vmg/redcarpet/blob/3e3f0b522fbe9283ba450334b5cec7a439dc0955/ext/redcarpet/markdown.c#L2774-L2778
A temporary test to see if this could fix anything would be to add:
md = malloc(sizeof(struct sd_markdown));
if (!md)
return NULL;
*md = (struct sd_markdown){0}; /* added line to test if this helps */
memcpy(&md->cb, callbacks, sizeof(struct sd_callbacks));
Or, replace with:
md = malloc(sizeof(struct sd_markdown));
if (!md)
return NULL;
*md = (struct sd_markdown){.cb = callbacks}; /* copies data and zeros out the rest */