Fails to build with Python 3.13 due to removal of PyObject_AsReadBuffer
Affected Operating Systems
All
Affected py-lmdb Version
All
py-lmdb Installation Method
Irrelevant
Using bundled or distribution-provided LMDB library?
Irrelevant
Distribution name and LMDB library version
Irrelevant
Machine "free -m" output
Irrelevant
Other important machine info
Irrelevant
Describe Your Problem
py-lmdb does not build against Python 3.13 because it uses PyObject_AsReadBuffer, which was removed in Python 3.13. The recommended replacement is PyObject_GetBuffer and PyBuffer_Release.
Errors/exceptions Encountered
lmdb/cpython.c: In function ‘val_from_buffer’:
lmdb/cpython.c:586:12: error: implicit declaration of function ‘PyObject_AsReadBuffer’; did you mean ‘PyObject_GetBuffer’? [-Wimplicit-function-declaration]
586 | return PyObject_AsReadBuffer(buf,
| ^~~~~~~~~~~~~~~~~~~~~
| PyObject_GetBuffer
Describe What You Expected To Happen
Successful build.
Describe What Happened Instead
Failed build.
So I can nearly fix this, but I'm struggling with the lifecycle of the view buffer PyObject_GetBuffer uses. This patch works, but obviously leaks those views:
diff --git a/lmdb/cpython.c b/lmdb/cpython.c
index ef30447..78838b9 100644
--- a/lmdb/cpython.c
+++ b/lmdb/cpython.c
@@ -583,9 +583,15 @@ val_from_buffer(MDB_val *val, PyObject *buf)
type_error("Won't implicitly convert Unicode to bytes; use .encode()");
return -1;
}
- return PyObject_AsReadBuffer(buf,
- (const void **) &val->mv_data,
- (Py_ssize_t *) &val->mv_size);
+ Py_buffer view;
+ int ret;
+ ret = PyObject_GetBuffer(buf, &view, PyBUF_SIMPLE);
+ if(ret == 0) {
+ val->mv_data = view.buf;
+ val->mv_size = view.len;
+ }
+ //PyBuffer_Release (&view);
+ return ret;
}
/* ------------------- */
If you uncomment the PyBuffer_Release, it crashes, because we aren't actually reading the data here, just setting a pointer to it.
I can't figure out a good way to handle this, because there are a lot of things that call val_from_buffer and they tend not to read the result right away, there's a lot of nesting going on. e.g. parse_args calls parse_arg which calls val_from_buffer, and dozens of things call parse_args and use the data at varying points afterwards...so what? Do we have every single one of those pass in a view and free it when they're done, and pass that view all the way down to val_from_buffer? That feels very icky, but I don't know what would be better. Note I am not any kind of C expert at all, maybe this is obvious to someone else.
Thanks for the heads up about 3.13. I haven't looked at that yet.
I'm afraid val_from_buffer and its callers will have to be restructured. The first approach I'd try is for the new val_from_buffer to takes a Py_buffer ** parameter that the created buffer is passed back into. Then for every caller of val_from_buffer there'd need to be a val_from_buffer_free that calls PyBuffer_Release on it if it isn't null.
This is a hard API break, and it probably means that the version that implements this will be the first version that doesn't support Python 2.7.
You're welcome to submit a PR that implements this. If not, I'll work on it after I have my current 3.12 branch working.
Thanks!
That's more or less what I was talking about at the end of my message, I think. But - if I'm understanding things correctly - it's difficult because it doesn't seem like we're always done using the data even within the scope of the immediate caller of val_from_buffer. Like in the parse_arg case - parse_arg calls val_from_buffer, but it doesn't really use the result itself, the thing that uses the result is two or more links further up the chain. So you wind up having to pass the Py_buffer an awfully long way to get it from the place it will ultimately need to be freed all the way to val_from_buffer.
That seemed like a lot of work that would be very easy to get wrong for me, since I don't know the codebase and I'm not at all a C hacker, so I passed on trying to write a PR, sorry!
NP at all. I appreciate the report.
This makes the tests pass with 3.13:
--- a/lmdb/cpython.c
+++ b/lmdb/cpython.c
@@ -583,8 +583,8 @@ val_from_buffer(MDB_val *val, PyObject *buf)
type_error("Won't implicitly convert Unicode to bytes; use .encode()");
return -1;
}
- return PyObject_AsReadBuffer(buf,
- (const void **) &val->mv_data,
+ return PyBytes_AsStringAndSize(buf,
+ (char **) &val->mv_data,
(Py_ssize_t *) &val->mv_size);
}
Obviously, if you still support Python 2.7, it needs some PY_VERSION_HEX >= 0x03000000 (or similar) guards.
OTOH so does this:
--- a/lmdb/cpython.c
+++ b/lmdb/cpython.c
@@ -583,9 +583,8 @@ val_from_buffer(MDB_val *val, PyObject *buf)
type_error("Won't implicitly convert Unicode to bytes; use .encode()");
return -1;
}
- return PyObject_AsReadBuffer(buf,
- (const void **) &val->mv_data,
- (Py_ssize_t *) &val->mv_size);
+ PyErr_SetString(PyExc_RuntimeError, "Oh no!");
+ return -1;
}
So perhaps the scenario is not tested :/
Taking the patch by @AdamWill and not makign it crash is a matter of never releasing the buffer when the ret is not 0:
--- a/lmdb/cpython.c
+++ b/lmdb/cpython.c
@@ -583,9 +583,15 @@ val_from_buffer(MDB_val *val, PyObject *buf)
type_error("Won't implicitly convert Unicode to bytes; use .encode()");
return -1;
}
- return PyObject_AsReadBuffer(buf,
- (const void **) &val->mv_data,
- (Py_ssize_t *) &val->mv_size);
+ Py_buffer view;
+ int ret;
+ ret = PyObject_GetBuffer(buf, &view, PyBUF_SIMPLE);
+ if(ret == 0) {
+ val->mv_data = view.buf;
+ val->mv_size = view.len;
+ PyBuffer_Release (&view);
+ }
+ return ret;
}
It does exactly the same thing as Python, so calling PyBuffer_Release before reading the buf and len is as safe as it ever was (i.e. it is not safe, which is why Python removed the function in the first place, but it does not make things worse).
Fix in https://github.com/jnwatson/py-lmdb/pull/368
For me the current pypi version 1.5.1 builds on Github actions with
- ubuntu-20.04, 3.13
- ubuntu-22.04, 3.13
- macos-12, 3.13
but not with
- macos-13, 3.13
- macos-14, 3.13
I can see the API was removed. So I don't really understand why it still builds on Linux.