libfpta icon indicating copy to clipboard operation
libfpta copied to clipboard

SWIG support (for Python and other languages)

Open erthink opened this issue 8 years ago • 13 comments

SWIG support for Python

erthink avatar Apr 05 '17 09:04 erthink

@oddjobz, this issue may be interested for you in pymamba context:

  • libfpta based on libmdbx, which is a fork of LMDB.
  • libfpta provides secondary indexes.
  • libfpta provides row/tuples (by libfptu).

More over, we need a graceful python support for libfpta.

erthink avatar Apr 10 '17 10:04 erthink

Hi, looking at the limitations, not really my cup of tea. Limited record size and limited database size are not really attributes I'm attracted to .. :) .. however I'd hadn't realised there were serious LMDB forks, I probably need to have a browse.

oddjobz avatar Apr 10 '17 10:04 oddjobz

These restrictions come from the original DNA of LMDB. On the other hand, such restrictions caused by understanding specific of LMDB/mdbx, with a bit of common sense :)

erthink avatar Apr 10 '17 11:04 erthink

Mm, maybe it's Google translate playing up .. I thought records were limited to 4k and database size limited to memory size?

oddjobz avatar Apr 10 '17 11:04 oddjobz

Not 'limited', but libfpta (and LMDB also) may be not the best choice if .

erthink avatar Apr 10 '17 11:04 erthink

Ahhhh, Ok.

oddjobz avatar Apr 10 '17 11:04 oddjobz

Actual 'the hard' limits are comes from libfptu, exactly from here:

fptu_max_tuple_bytes = 262140 // maximum total size of the serialized representation of the tuple
fptu_max_cols = 1022 // maximum tag/id-number of the field/column
fptu_max_fields = 16383 // maximum total number of fields/columns in the same tuple
fptu_max_field_bytes = 65535 // max size of the field/column
fptu_max_opaque_bytes = 65531 // maximum size of an arbitrary sequence of bytes
fptu_max_array = 2047 // max number of elements in the array

erthink avatar Apr 10 '17 11:04 erthink

Ok, I'm thinking my approach is a little different, rather than dealing in fields / columns, I'm writing JSON blobs as values, and converting back and fore between Python dict items on read/write .. seeing around 40,000 writes per second on a single core (in Python), or 30,000 writes per second on a table with a compound index. Reading is much faster, reading through a compound index with 5 keys yields around 200,000 records per second. (again, this is in Python)

oddjobz avatar Apr 10 '17 12:04 oddjobz

I think I need to explain a little bit of my plans. so, In libfptu I will add:

  • lightweight but elegance schema support;
  • (de)serialization into json.
  • (de)serialization support msgpack, including nested fields/columns.

Therefore, the libfpta+libfptu couple in comparison to "LMDB+json" will:

  • surpass your needs for json-objects;
  • require less storage footprint;
  • provide more performance.

erthink avatar Apr 10 '17 12:04 erthink

Ok, that sounds good, currently I'm relying on the Python-lmdb package .. do you have an equivalent, or is this the bit you're missing?

oddjobz avatar Apr 10 '17 13:04 oddjobz

Nowadays mdbx/fptu/fpta haven't any support for python.

I think python support could be useful only with schema and json (de)serialization. So, I plan on doing this as soon as I just finish https://github.com/ReOpen/libmdbx/issues/7 and https://github.com/ReOpen/libmdbx/issues/8.

erthink avatar Apr 10 '17 14:04 erthink

Sure, at the end of the day, very few use-cases for databases involve low-level programming, so access from the likes of Python, Node, PHP etc, and the performance of those interfaces are pretty key. The driving force for me was seeing my Python write speed down to ~ 2000/sec with MongoDB.

oddjobz avatar Apr 10 '17 14:04 oddjobz

Related to https://github.com/jnwatson/py-lmdb/issues/204

erthink avatar Jul 18 '19 18:07 erthink