cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Heap-buffer-overflow in `json.encoder` indentation cache via re-entrant `__mul__`

Open jackfromeast opened this issue 3 weeks ago • 7 comments

What happened?

A crafted indent object overrides __mul__, so when create_indent_cache calls PySequence_Repeat it can drop the encoder’s last reference and free the encoder mid‑call. The freed memory is then reallocated for a 1‑element indent cache, but update_indent_cache still indexes it as if it were the original larger cache, causing a heap‑buffer‑overflow during JSON encoding.

Proof of Concept:

import json

_holder = []

class EvilIndent(str):
    def __mul__(self, count):
        _holder.pop()
        return " "

indent = EvilIndent(" ")
encoder = json.encoder.c_make_encoder(
    None,                    
    lambda obj: obj,       
    lambda obj: obj,         
    indent,
    ":",
    ", ",
    False,                   
    False,                   
    False,                   
)
_holder.append(encoder)

encoder([None], 1) 
Affected Versions
Python Version Status Exit Code
Python 3.9.24+ (heads/3.9:111bbc15b26, Oct 28 2025, 16:51:20) OK 0
Python 3.10.19+ (heads/3.10:014261980b1, Oct 28 2025, 16:52:08) [Clang 18.1.3 (1ubuntu1)] OK 0
Python 3.11.14+ (heads/3.11:88f3f5b5f11, Oct 28 2025, 16:53:08) [Clang 18.1.3 (1ubuntu1)] OK 0
Python 3.12.12+ (heads/3.12:8cb2092bd8c, Oct 28 2025, 16:54:14) [Clang 18.1.3 (1ubuntu1)] OK 0
Python 3.13.9+ (heads/3.13:9c8eade20c6, Oct 28 2025, 16:55:18) [Clang 18.1.3 (1ubuntu1)] OK 0
Python 3.14.0+ (heads/3.14:2e216728038, Oct 28 2025, 16:56:16) [Clang 18.1.3 (1ubuntu1)] ASAN 1
Python 3.15.0a1+ (heads/main:f5394c257ce, Oct 28 2025, 16:57:16) [Clang 18.1.3 (1ubuntu1)] ASAN 1
Vulnerable Code
static PyObject *
create_indent_cache(PyEncoderObject *s, Py_ssize_t indent_level)
{
    PyObject *newline_indent = PyUnicode_FromOrdinal('\n');
    if (newline_indent != NULL && indent_level) {
        PyUnicode_AppendAndDel(&newline_indent,
                               PySequence_Repeat(s->indent, indent_level)); // Trigger __mul__ method and free the s
    }
    if (newline_indent == NULL) {
        return NULL;
    }

    // Occupy our freed buffer
    PyObject *indent_cache = PyList_New(1);
    if (indent_cache == NULL) {
        Py_DECREF(newline_indent);
        return NULL;
    }
    PyList_SET_ITEM(indent_cache, 0, newline_indent);
    return indent_cache;
}

static int
update_indent_cache(PyEncoderObject *s,
                    Py_ssize_t indent_level, PyObject *indent_cache)
{
    assert(indent_level * 2 == PyList_GET_SIZE(indent_cache) + 1);
    assert(indent_level > 0);
    // Access the freed buffer / occupied by indent_cache right now
    PyObject *newline_indent = PyList_GET_ITEM(indent_cache, (indent_level - 1)*2);
    newline_indent = PyUnicode_Concat(newline_indent, s->indent);
    if (newline_indent == NULL) {
        return -1;
    }

    /* ... */
Sanitizer Output
=================================================================
==1643271==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x5020000051a0 at pc 0x7e13a53054e7 bp 0x7ffe02157440 sp 0x7ffe02157438
READ of size 8 at 0x5020000051a0 thread T0
    #0 0x7e13a53054e6 in update_indent_cache /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/./Modules/_json.c:1411:32
    #1 0x7e13a53054e6 in get_item_separator /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/./Modules/_json.c:1440:13
    #2 0x7e13a5302a6b in encoder_listencode_list /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/./Modules/_json.c:1982:21
    #3 0x7e13a52fea23 in encoder_call /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/./Modules/_json.c:1483:9
    #4 0x5d8596341b81 in _PyObject_MakeTpCall /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Objects/call.c:242:18
    #5 0x5d8596941c62 in _PyEval_EvalFrameDefault /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/generated_cases.c.h:1620:35
    #6 0x5d8596910bf4 in _PyEval_Vector /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/ceval.c:2005:12
    #7 0x5d8596910bf4 in PyEval_EvalCode /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/ceval.c:888:21
    #8 0x5d8596bd54d4 in run_mod /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/pythonrun.c:1459:19
    #9 0x5d8596bcf02d in pyrun_file /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/pythonrun.c:1293:15
    #10 0x5d8596bcc2d3 in _PyRun_SimpleFileObject /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/pythonrun.c:521:13
    #11 0x5d8596bcb89e in _PyRun_AnyFileObject /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/pythonrun.c:81:15
    #12 0x5d8596c90b13 in pymain_run_file_obj /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:410:15
    #13 0x5d8596c90b13 in pymain_run_file /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:429:15
    #14 0x5d8596c8dbcb in pymain_run_python /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:691:21
    #15 0x5d8596c8dbcb in Py_RunMain /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:772:5
    #16 0x5d8596c8f7fb in pymain_main /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:802:12
    #17 0x5d8596c8faa2 in Py_BytesMain /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:826:12
    #18 0x7e13a742a1c9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #19 0x7e13a742a28a in __libc_start_main csu/../csu/libc-start.c:360:3
    #20 0x5d859604c114 in _start (/home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/python+0x6b0114) (BuildId: 0aee20a59f1c25de22733bd0e5f8259ab04406c4)

0x5020000051a0 is located 8 bytes after 8-byte region [0x502000005190,0x502000005198)
allocated by thread T0 here:
    #0 0x5d85960e714d in calloc (/home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/python+0x74b14d) (BuildId: 0aee20a59f1c25de22733bd0e5f8259ab04406c4)
    #1 0x5d8596424d76 in PyList_New /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Objects/listobject.c:262:37
    #2 0x7e13a52fe965 in create_indent_cache /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/./Modules/_json.c:1393:30
    #3 0x7e13a52fe965 in encoder_call /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/./Modules/_json.c:1477:24
    #4 0x5d8596341b81 in _PyObject_MakeTpCall /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Objects/call.c:242:18
    #5 0x5d8596941c62 in _PyEval_EvalFrameDefault /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/generated_cases.c.h:1620:35
    #6 0x5d8596910bf4 in _PyEval_Vector /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/ceval.c:2005:12
    #7 0x5d8596910bf4 in PyEval_EvalCode /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/ceval.c:888:21
    #8 0x5d8596bd54d4 in run_mod /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/pythonrun.c:1459:19
    #9 0x5d8596bcf02d in pyrun_file /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/pythonrun.c:1293:15
    #10 0x5d8596bcc2d3 in _PyRun_SimpleFileObject /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/pythonrun.c:521:13
    #11 0x5d8596bcb89e in _PyRun_AnyFileObject /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Python/pythonrun.c:81:15
    #12 0x5d8596c90b13 in pymain_run_file_obj /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:410:15
    #13 0x5d8596c90b13 in pymain_run_file /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:429:15
    #14 0x5d8596c8dbcb in pymain_run_python /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:691:21
    #15 0x5d8596c8dbcb in Py_RunMain /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:772:5
    #16 0x5d8596c8f7fb in pymain_main /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:802:12
    #17 0x5d8596c8faa2 in Py_BytesMain /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/Modules/main.c:826:12
    #18 0x7e13a742a1c9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #19 0x7e13a742a28a in __libc_start_main csu/../csu/libc-start.c:360:3
    #20 0x5d859604c114 in _start (/home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/python+0x6b0114) (BuildId: 0aee20a59f1c25de22733bd0e5f8259ab04406c4)

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/jackfromeast/Desktop/entropy/tasks/reproducexx/targets/cpython-main/./Modules/_json.c:1411:32 in update_indent_cache
Shadow bytes around the buggy address:
  0x502000004f00: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
  0x502000004f80: fa fa fd fa fa fa fd fa fa fa fd fd fa fa fd fa
  0x502000005000: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
  0x502000005080: fa fa fd fd fa fa fd fa fa fa fd fd fa fa fd fa
  0x502000005100: fa fa fd fa fa fa fd fa fa fa 02 fa fa fa 00 fa
=>0x502000005180: fa fa 00 fa[fa]fa fa fa fa fa fa fa fa fa fa fa
  0x502000005200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x502000005280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x502000005300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x502000005380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x502000005400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==1643271==ABORTING

Linked PRs

  • gh-143246

jackfromeast avatar Dec 26 '25 21:12 jackfromeast

6b2a19681eb5c132caa73a268724c7d502794867 this commit first bring the error

yihong0618 avatar Dec 27 '25 01:12 yihong0618

I can confirm the issue, but the mechanism is a different as described. Here is a reproducer without a custom string:

import json

indent = '  '
encoder = json.encoder.c_make_encoder(
    None,                    
    default=lambda obj: obj,       
    encoder= lambda obj: obj,         
    indent = indent,
    key_separator = ":",
   item_separator= ", ",
  skipkeys=  False,                   
 allow_nan=   False,                   
    sort_keys=False,                   
)

print('start')
encoder([None], 1) 
print('end')

The issue is the second argument to encoder: it should be 0, otherwise the cache is being used while not initialized.

The encoder is created using c_make_encoder. I suspect the c_make_encoder is intended to be a private method (it is part of _json, I cannot find public documentation on it), but it is imported as a public method. So one could argue this is not a bug, but bad usage of an internal interface.

Fixing the issue is not hard, so maybe that is the best option here. Alternatively, we could leave it as is and (optionally) change the import

from _json import make_encoder as c_make_encoder

to

from _json import make_encoder as _c_make_encoder

@serhiy-storchaka Do you have an opinion on how to resolve this?

eendebakpt avatar Dec 27 '25 19:12 eendebakpt

Doesn't that make this a duplicate of #140750?

nineteendo avatar Dec 28 '25 10:12 nineteendo

It seems there are two related bugs in the encoder (introduced in gh-95382):

  1. Uninitialized cache access: When c_make_encoder is called with _current_indent_level > 0, the cache is created with only 1 element, but update_indent_cache expects the cache to be incrementally built starting from level 0. This causes out-of-bounds memory access.
  2. Re-entrant __mul__ vulnerability: PySequence_Repeat(s->indent, indent_level) in create_indent_cache can execute arbitrary Python code through a custom __mul__ method, potentially causing use-after-free.

I have a little branch with a potential solution: two validation checks in _json.c:

  1. Type check in encoder_new: Require indent to be str or None (reject non-string objects like integers or lists that would fail later anyway).
  2. Indent level check in encoder_call: Require _current_indent_level == 0 when indent is set. This:
  • Prevents the uninitialized cache access (the cache is designed to start at level 0)
  • Eliminates the __mul__ attack vector since PySequence_Repeat is only called when indent_level != 0
static PyObject *
encoder_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
    ...
    if (indent != Py_None && !PyUnicode_Check(indent)) {
        PyErr_Format(PyExc_TypeError,
                     "make_encoder() argument 4 must be str or None, "
                     "not %.200s", Py_TYPE(indent)->tp_name);
        return NULL;
    }
    ...
}

But, just want to confirm whether it is a reasonable solution/I'm allowed to submit the PR.

caverac avatar Dec 28 '25 10:12 caverac

@caverac A PR is welcome. Maybe we need to replace the unicode check with a conversion to unicode. (I would have to look at the details for this)

eendebakpt avatar Dec 28 '25 11:12 eendebakpt

It seems there are two related bugs in the encoder (introduced in gh-95382):

  1. Uninitialized cache access: When c_make_encoder is called with _current_indent_level > 0, the cache is created with only 1 element, but update_indent_cache expects the cache to be incrementally built starting from level 0. This causes out-of-bounds memory access.
  2. Re-entrant __mul__ vulnerability: PySequence_Repeat(s->indent, indent_level) in create_indent_cache can execute arbitrary Python code through a custom __mul__ method, potentially causing use-after-free.

I have a little branch with a potential solution: two validation checks in _json.c:

  1. Type check in encoder_new: Require indent to be str or None (reject non-string objects like integers or lists that would fail later anyway).
  2. Indent level check in encoder_call: Require _current_indent_level == 0 when indent is set. This:
  • Prevents the uninitialized cache access (the cache is designed to start at level 0)
  • Eliminates the __mul__ attack vector since PySequence_Repeat is only called when indent_level != 0
static PyObject *
encoder_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
    ...
    if (indent != Py_None && !PyUnicode_Check(indent)) {
        PyErr_Format(PyExc_TypeError,
                     "make_encoder() argument 4 must be str or None, "
                     "not %.200s", Py_TYPE(indent)->tp_name);
        return NULL;
    }
    ...
}

But, just want to confirm whether it is a reasonable solution/I'm allowed to submit the PR.

will this make the performance lower? Since the root cause is aim to better the performance

yihong0618 avatar Dec 28 '25 11:12 yihong0618

@yihong0618 I let a comment on the PR regarding potential performance implications. My take on it is: I don't think performance is an issue since the performance-critical recursive encoding path (encoder_listencode_list, encoder_listencode_dict) is completely unchanged. But I would love to hear your thoughts on it.

caverac avatar Dec 28 '25 13:12 caverac

The simplest fix is to reset the current indent level to zero after initializing the indent cache.

We can consider removing the _current_indent_level parameter, but this is other issue. The C implementation mimicks the Python implementation, and the Python implementation needs such parameter. But they all are internal details.

serhiy-storchaka avatar Jan 09 '26 16:01 serhiy-storchaka