Heap-buffer-overflow in `json.encoder` indentation cache via re-entrant `__mul__`
What happened?
A crafted indent object overrides __mul__, so when create_indent_cache calls PySequence_Repeat it can drop the encoder’s last reference and free the encoder mid‑call. The freed memory is then reallocated for a 1‑element indent cache, but update_indent_cache still indexes it as if it were the original larger cache, causing a heap‑buffer‑overflow during JSON encoding.
Proof of Concept:
import json
_holder = []
class EvilIndent(str):
def __mul__(self, count):
_holder.pop()
return " "
indent = EvilIndent(" ")
encoder = json.encoder.c_make_encoder(
None,
lambda obj: obj,
lambda obj: obj,
indent,
":",
", ",
False,
False,
False,
)
_holder.append(encoder)
encoder([None], 1)
Affected Versions
| Python Version | Status | Exit Code |
|---|---|---|
Python 3.9.24+ (heads/3.9:111bbc15b26, Oct 28 2025, 16:51:20) |
OK | 0 |
Python 3.10.19+ (heads/3.10:014261980b1, Oct 28 2025, 16:52:08) [Clang 18.1.3 (1ubuntu1)] |
OK | 0 |
Python 3.11.14+ (heads/3.11:88f3f5b5f11, Oct 28 2025, 16:53:08) [Clang 18.1.3 (1ubuntu1)] |
OK | 0 |
Python 3.12.12+ (heads/3.12:8cb2092bd8c, Oct 28 2025, 16:54:14) [Clang 18.1.3 (1ubuntu1)] |
OK | 0 |
Python 3.13.9+ (heads/3.13:9c8eade20c6, Oct 28 2025, 16:55:18) [Clang 18.1.3 (1ubuntu1)] |
OK | 0 |
Python 3.14.0+ (heads/3.14:2e216728038, Oct 28 2025, 16:56:16) [Clang 18.1.3 (1ubuntu1)] |
ASAN | 1 |
Python 3.15.0a1+ (heads/main:f5394c257ce, Oct 28 2025, 16:57:16) [Clang 18.1.3 (1ubuntu1)] |
ASAN | 1 |
Vulnerable Code
static PyObject *
create_indent_cache(PyEncoderObject *s, Py_ssize_t indent_level)
{
PyObject *newline_indent = PyUnicode_FromOrdinal('\n');
if (newline_indent != NULL && indent_level) {
PyUnicode_AppendAndDel(&newline_indent,
PySequence_Repeat(s->indent, indent_level)); // Trigger __mul__ method and free the s
}
if (newline_indent == NULL) {
return NULL;
}
// Occupy our freed buffer
PyObject *indent_cache = PyList_New(1);
if (indent_cache == NULL) {
Py_DECREF(newline_indent);
return NULL;
}
PyList_SET_ITEM(indent_cache, 0, newline_indent);
return indent_cache;
}
static int
update_indent_cache(PyEncoderObject *s,
Py_ssize_t indent_level, PyObject *indent_cache)
{
assert(indent_level * 2 == PyList_GET_SIZE(indent_cache) + 1);
assert(indent_level > 0);
// Access the freed buffer / occupied by indent_cache right now
PyObject *newline_indent = PyList_GET_ITEM(indent_cache, (indent_level - 1)*2);
newline_indent = PyUnicode_Concat(newline_indent, s->indent);
if (newline_indent == NULL) {
return -1;
}
/* ... */
Sanitizer Output
=================================================================
==1643271==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x5020000051a0 at pc 0x7e13a53054e7 bp 0x7ffe02157440 sp 0x7ffe02157438
READ of size 8 at 0x5020000051a0 thread T0
#0 0x7e13a53054e6 in update_indent_cache /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/./Modules/_json.c:1411:32
#1 0x7e13a53054e6 in get_item_separator /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/./Modules/_json.c:1440:13
#2 0x7e13a5302a6b in encoder_listencode_list /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/./Modules/_json.c:1982:21
#3 0x7e13a52fea23 in encoder_call /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/./Modules/_json.c:1483:9
#4 0x5d8596341b81 in _PyObject_MakeTpCall /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Objects/call.c:242:18
#5 0x5d8596941c62 in _PyEval_EvalFrameDefault /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/generated_cases.c.h:1620:35
#6 0x5d8596910bf4 in _PyEval_Vector /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/ceval.c:2005:12
#7 0x5d8596910bf4 in PyEval_EvalCode /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/ceval.c:888:21
#8 0x5d8596bd54d4 in run_mod /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/pythonrun.c:1459:19
#9 0x5d8596bcf02d in pyrun_file /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/pythonrun.c:1293:15
#10 0x5d8596bcc2d3 in _PyRun_SimpleFileObject /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/pythonrun.c:521:13
#11 0x5d8596bcb89e in _PyRun_AnyFileObject /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/pythonrun.c:81:15
#12 0x5d8596c90b13 in pymain_run_file_obj /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:410:15
#13 0x5d8596c90b13 in pymain_run_file /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:429:15
#14 0x5d8596c8dbcb in pymain_run_python /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:691:21
#15 0x5d8596c8dbcb in Py_RunMain /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:772:5
#16 0x5d8596c8f7fb in pymain_main /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:802:12
#17 0x5d8596c8faa2 in Py_BytesMain /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:826:12
#18 0x7e13a742a1c9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#19 0x7e13a742a28a in __libc_start_main csu/../csu/libc-start.c:360:3
#20 0x5d859604c114 in _start (/home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/python+0x6b0114) (BuildId: 0aee20a59f1c25de22733bd0e5f8259ab04406c4)
0x5020000051a0 is located 8 bytes after 8-byte region [0x502000005190,0x502000005198)
allocated by thread T0 here:
#0 0x5d85960e714d in calloc (/home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/python+0x74b14d) (BuildId: 0aee20a59f1c25de22733bd0e5f8259ab04406c4)
#1 0x5d8596424d76 in PyList_New /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Objects/listobject.c:262:37
#2 0x7e13a52fe965 in create_indent_cache /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/./Modules/_json.c:1393:30
#3 0x7e13a52fe965 in encoder_call /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/./Modules/_json.c:1477:24
#4 0x5d8596341b81 in _PyObject_MakeTpCall /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Objects/call.c:242:18
#5 0x5d8596941c62 in _PyEval_EvalFrameDefault /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/generated_cases.c.h:1620:35
#6 0x5d8596910bf4 in _PyEval_Vector /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/ceval.c:2005:12
#7 0x5d8596910bf4 in PyEval_EvalCode /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/ceval.c:888:21
#8 0x5d8596bd54d4 in run_mod /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/pythonrun.c:1459:19
#9 0x5d8596bcf02d in pyrun_file /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/pythonrun.c:1293:15
#10 0x5d8596bcc2d3 in _PyRun_SimpleFileObject /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/pythonrun.c:521:13
#11 0x5d8596bcb89e in _PyRun_AnyFileObject /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/pythonrun.c:81:15
#12 0x5d8596c90b13 in pymain_run_file_obj /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:410:15
#13 0x5d8596c90b13 in pymain_run_file /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:429:15
#14 0x5d8596c8dbcb in pymain_run_python /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:691:21
#15 0x5d8596c8dbcb in Py_RunMain /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:772:5
#16 0x5d8596c8f7fb in pymain_main /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:802:12
#17 0x5d8596c8faa2 in Py_BytesMain /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:826:12
#18 0x7e13a742a1c9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#19 0x7e13a742a28a in __libc_start_main csu/../csu/libc-start.c:360:3
#20 0x5d859604c114 in _start (/home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/python+0x6b0114) (BuildId: 0aee20a59f1c25de22733bd0e5f8259ab04406c4)
SUMMARY: AddressSanitizer: heap-buffer-overflow /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/./Modules/_json.c:1411:32 in update_indent_cache
Shadow bytes around the buggy address:
0x502000004f00: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
0x502000004f80: fa fa fd fa fa fa fd fa fa fa fd fd fa fa fd fa
0x502000005000: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
0x502000005080: fa fa fd fd fa fa fd fa fa fa fd fd fa fa fd fa
0x502000005100: fa fa fd fa fa fa fd fa fa fa 02 fa fa fa 00 fa
=>0x502000005180: fa fa 00 fa[fa]fa fa fa fa fa fa fa fa fa fa fa
0x502000005200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x502000005280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x502000005300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x502000005380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x502000005400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==1643271==ABORTING
Linked PRs
- gh-143246
6b2a19681eb5c132caa73a268724c7d502794867 this commit first bring the error
I can confirm the issue, but the mechanism is a different as described. Here is a reproducer without a custom string:
import json
indent = ' '
encoder = json.encoder.c_make_encoder(
None,
default=lambda obj: obj,
encoder= lambda obj: obj,
indent = indent,
key_separator = ":",
item_separator= ", ",
skipkeys= False,
allow_nan= False,
sort_keys=False,
)
print('start')
encoder([None], 1)
print('end')
The issue is the second argument to encoder: it should be 0, otherwise the cache is being used while not initialized.
The encoder is created using c_make_encoder. I suspect the c_make_encoder is intended to be a private method (it is part of _json, I cannot find public documentation on it), but it is imported as a public method. So one could argue this is not a bug, but bad usage of an internal interface.
Fixing the issue is not hard, so maybe that is the best option here. Alternatively, we could leave it as is and (optionally) change the import
from _json import make_encoder as c_make_encoder
to
from _json import make_encoder as _c_make_encoder
@serhiy-storchaka Do you have an opinion on how to resolve this?
Doesn't that make this a duplicate of #140750?
It seems there are two related bugs in the encoder (introduced in gh-95382):
- Uninitialized cache access: When
c_make_encoderis called with_current_indent_level > 0, the cache is created with only 1 element, butupdate_indent_cacheexpects the cache to be incrementally built starting from level 0. This causes out-of-bounds memory access. - Re-entrant
__mul__vulnerability:PySequence_Repeat(s->indent, indent_level)increate_indent_cachecan execute arbitrary Python code through a custom__mul__method, potentially causing use-after-free.
I have a little branch with a potential solution: two validation checks in _json.c:
- Type check in
encoder_new: Require indent to be str or None (reject non-string objects like integers or lists that would fail later anyway). - Indent level check in encoder_call: Require
_current_indent_level == 0when indent is set. This:
- Prevents the uninitialized cache access (the cache is designed to start at level 0)
- Eliminates the
__mul__attack vector since PySequence_Repeat is only called whenindent_level != 0
static PyObject *
encoder_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
...
if (indent != Py_None && !PyUnicode_Check(indent)) {
PyErr_Format(PyExc_TypeError,
"make_encoder() argument 4 must be str or None, "
"not %.200s", Py_TYPE(indent)->tp_name);
return NULL;
}
...
}
But, just want to confirm whether it is a reasonable solution/I'm allowed to submit the PR.
@caverac A PR is welcome. Maybe we need to replace the unicode check with a conversion to unicode. (I would have to look at the details for this)
It seems there are two related bugs in the encoder (introduced in gh-95382):
- Uninitialized cache access: When
c_make_encoderis called with_current_indent_level > 0, the cache is created with only 1 element, butupdate_indent_cacheexpects the cache to be incrementally built starting from level 0. This causes out-of-bounds memory access.- Re-entrant
__mul__vulnerability:PySequence_Repeat(s->indent, indent_level)increate_indent_cachecan execute arbitrary Python code through a custom__mul__method, potentially causing use-after-free.I have a little branch with a potential solution: two validation checks in
_json.c:
- Type check in
encoder_new: Require indent to be str or None (reject non-string objects like integers or lists that would fail later anyway).- Indent level check in encoder_call: Require
_current_indent_level == 0when indent is set. This:
- Prevents the uninitialized cache access (the cache is designed to start at level 0)
- Eliminates the
__mul__attack vector since PySequence_Repeat is only called whenindent_level != 0static PyObject * encoder_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { ... if (indent != Py_None && !PyUnicode_Check(indent)) { PyErr_Format(PyExc_TypeError, "make_encoder() argument 4 must be str or None, " "not %.200s", Py_TYPE(indent)->tp_name); return NULL; } ... }But, just want to confirm whether it is a reasonable solution/I'm allowed to submit the PR.
will this make the performance lower? Since the root cause is aim to better the performance
@yihong0618 I let a comment on the PR regarding potential performance implications. My take on it is: I don't think performance is an issue since the performance-critical recursive encoding path (encoder_listencode_list, encoder_listencode_dict) is completely unchanged. But I would love to hear your thoughts on it.
The simplest fix is to reset the current indent level to zero after initializing the indent cache.
We can consider removing the _current_indent_level parameter, but this is other issue. The C implementation mimicks the Python implementation, and the Python implementation needs such parameter. But they all are internal details.