py-lmdb Fails to build with Python 3.13 due to removal of PyObject

Affected Operating Systems

All

Affected py-lmdb Version

All

py-lmdb Installation Method

Irrelevant

Using bundled or distribution-provided LMDB library?

Irrelevant

Distribution name and LMDB library version

Irrelevant

Machine "free -m" output

Irrelevant

Other important machine info

Irrelevant

Describe Your Problem

py-lmdb does not build against Python 3.13 because it uses PyObject_AsReadBuffer, which was removed in Python 3.13. The recommended replacement is PyObject_GetBuffer and PyBuffer_Release.

Errors/exceptions Encountered

    lmdb/cpython.c: In function ‘val_from_buffer’:
    lmdb/cpython.c:586:12: error: implicit declaration of function ‘PyObject_AsReadBuffer’; did you mean ‘PyObject_GetBuffer’? [-Wimplicit-function-declaration]
      586 |     return PyObject_AsReadBuffer(buf,
          |            ^~~~~~~~~~~~~~~~~~~~~
          |            PyObject_GetBuffer

Describe What You Expected To Happen

Successful build.

Describe What Happened Instead

Failed build.

Jun 14 '24 19:06 AdamWill

So I can nearly fix this, but I'm struggling with the lifecycle of the view buffer PyObject_GetBuffer uses. This patch works, but obviously leaks those views:

diff --git a/lmdb/cpython.c b/lmdb/cpython.c
index ef30447..78838b9 100644
--- a/lmdb/cpython.c
+++ b/lmdb/cpython.c
@@ -583,9 +583,15 @@ val_from_buffer(MDB_val *val, PyObject *buf)
         type_error("Won't implicitly convert Unicode to bytes; use .encode()");
         return -1;
     }
-    return PyObject_AsReadBuffer(buf,
-        (const void **) &val->mv_data,
-        (Py_ssize_t *) &val->mv_size);
+    Py_buffer view;
+    int ret;
+    ret = PyObject_GetBuffer(buf, &view, PyBUF_SIMPLE);
+    if(ret == 0) {
+        val->mv_data = view.buf;
+        val->mv_size = view.len;
+    }
+    //PyBuffer_Release (&view);
+    return ret;
 }
 
 /* ------------------- */

If you uncomment the PyBuffer_Release, it crashes, because we aren't actually reading the data here, just setting a pointer to it.

I can't figure out a good way to handle this, because there are a lot of things that call val_from_buffer and they tend not to read the result right away, there's a lot of nesting going on. e.g. parse_args calls parse_arg which calls val_from_buffer, and dozens of things call parse_args and use the data at varying points afterwards...so what? Do we have every single one of those pass in a view and free it when they're done, and pass that view all the way down to val_from_buffer? That feels very icky, but I don't know what would be better. Note I am not any kind of C expert at all, maybe this is obvious to someone else.

Jun 14 '24 20:06 AdamWill

Thanks for the heads up about 3.13. I haven't looked at that yet.

I'm afraid val_from_buffer and its callers will have to be restructured. The first approach I'd try is for the new val_from_buffer to takes a Py_buffer ** parameter that the created buffer is passed back into. Then for every caller of val_from_buffer there'd need to be a val_from_buffer_free that calls PyBuffer_Release on it if it isn't null.

This is a hard API break, and it probably means that the version that implements this will be the first version that doesn't support Python 2.7.

You're welcome to submit a PR that implements this. If not, I'll work on it after I have my current 3.12 branch working.

Jun 15 '24 15:06 jnwatson

Thanks!

That's more or less what I was talking about at the end of my message, I think. But - if I'm understanding things correctly - it's difficult because it doesn't seem like we're always done using the data even within the scope of the immediate caller of val_from_buffer. Like in the parse_arg case - parse_arg calls val_from_buffer, but it doesn't really use the result itself, the thing that uses the result is two or more links further up the chain. So you wind up having to pass the Py_buffer an awfully long way to get it from the place it will ultimately need to be freed all the way to val_from_buffer.

That seemed like a lot of work that would be very easy to get wrong for me, since I don't know the codebase and I'm not at all a C hacker, so I passed on trying to write a PR, sorry!

Jun 15 '24 15:06 AdamWill

NP at all. I appreciate the report.

Jun 15 '24 15:06 jnwatson

This makes the tests pass with 3.13:

--- a/lmdb/cpython.c
+++ b/lmdb/cpython.c
@@ -583,8 +583,8 @@ val_from_buffer(MDB_val *val, PyObject *buf)
         type_error("Won't implicitly convert Unicode to bytes; use .encode()");
         return -1;
     }
-    return PyObject_AsReadBuffer(buf,
-        (const void **) &val->mv_data,
+    return PyBytes_AsStringAndSize(buf,
+        (char **) &val->mv_data,
         (Py_ssize_t *) &val->mv_size);
 }

Obviously, if you still support Python 2.7, it needs some PY_VERSION_HEX >= 0x03000000 (or similar) guards.

Sep 04 '24 16:09 hroncok

OTOH so does this:

--- a/lmdb/cpython.c
+++ b/lmdb/cpython.c
@@ -583,9 +583,8 @@ val_from_buffer(MDB_val *val, PyObject *buf)
         type_error("Won't implicitly convert Unicode to bytes; use .encode()");
         return -1;
     }
-    return PyObject_AsReadBuffer(buf,
-        (const void **) &val->mv_data,
-        (Py_ssize_t *) &val->mv_size);
+    PyErr_SetString(PyExc_RuntimeError, "Oh no!");
+    return -1;
 }

So perhaps the scenario is not tested :/

Sep 04 '24 16:09 hroncok

Taking the patch by @AdamWill and not makign it crash is a matter of never releasing the buffer when the ret is not 0:

--- a/lmdb/cpython.c
+++ b/lmdb/cpython.c
@@ -583,9 +583,15 @@ val_from_buffer(MDB_val *val, PyObject *buf)
         type_error("Won't implicitly convert Unicode to bytes; use .encode()");
         return -1;
     }
-    return PyObject_AsReadBuffer(buf,
-        (const void **) &val->mv_data,
-        (Py_ssize_t *) &val->mv_size);
+    Py_buffer view;
+    int ret;
+    ret = PyObject_GetBuffer(buf, &view, PyBUF_SIMPLE);
+    if(ret == 0) {
+        val->mv_data = view.buf;
+        val->mv_size = view.len;
+        PyBuffer_Release (&view);
+    }
+    return ret;
 }

It does exactly the same thing as Python, so calling PyBuffer_Release before reading the buf and len is as safe as it ever was (i.e. it is not safe, which is why Python removed the function in the first place, but it does not make things worse).

Sep 04 '24 17:09 hroncok

Fix in https://github.com/jnwatson/py-lmdb/pull/368

Sep 04 '24 17:09 hroncok

For me the current pypi version 1.5.1 builds on Github actions with

ubuntu-20.04, 3.13
ubuntu-22.04, 3.13
macos-12, 3.13

but not with

macos-13, 3.13
macos-14, 3.13

I can see the API was removed. So I don't really understand why it still builds on Linux.

Nov 13 '24 13:11 Dobatymo

Fails to build with Python 3.13 due to removal of PyObject_AsReadBuffer

Affected Operating Systems

Affected py-lmdb Version

py-lmdb Installation Method

Using bundled or distribution-provided LMDB library?

Distribution name and LMDB library version

Machine "free -m" output

Other important machine info

Describe Your Problem

Errors/exceptions Encountered

Describe What You Expected To Happen

Describe What Happened Instead