seq icon indicating copy to clipboard operation
seq copied to clipboard

Idea: interface file for Python C libs

Open hashbackup opened this issue 5 years ago • 3 comments

I ran a quick 10M {int:int} dict benchmark with Python 2.7 and Seq and was quite impressed. I'ts posted on your Hacker News announcement. The Python version used 1.1 GB and 8 seconds, Seq used 395 MB and 5 seconds. I didn't add any type info, just changed one line that created an empty dict. Congratulations!

I did some poking around and it looks like supporting the Python stdlib requires porting all the C code to Seq. I also saw that Seq can import regular Python code with pyimport. What I was wondering is, is it possible or would it make sense to allow importing Cython C extensions by providing an interface spec file, like os.pyi for example. This spec file would document the types, classes, return values, etc. and then allow a Seq program to use the standard Python built-ins without a rewrite.

There would have to be a wrapper generated for each function called, or even multiple wrappers if different types are used. The wrapper would have to create CPython objects for each argument, take the GIL, call the C extension, and then convert any returned values from CPython objects to Seq objects. The interface spec file would have to document not only the types, but which arguments might be modified. For example, passing a list to a C extension might cause the list to be modified (sort for example), or the list might not be modified, eg, len(list).

hashbackup avatar Jan 22 '20 18:01 hashbackup

Great to hear that! That's an interesting idea, and I think we already have some of the machinery you mention in place (e.g. the wrapper already exists to support our current Python interop). I don't immediately see any reason why we couldn't do something like this; I think really the only somewhat tricky part may be parsing the interface files and mapping them to Seq's type system. Let me think a bit more about this!

arshajii avatar Jan 23 '20 00:01 arshajii

In my original post, I said Cython extensions one place and CPython extensions in another, but I meant CPython in both cases. Though I guess the same mechanism oould work for both.

There is a project cppyy that does a similar thing for interfacing C++ to Python. I've never used it.

Another possibility is to take a look at pypy's standard library, where a lot of CPython C code has already been rewritten in Python(ish).

hashbackup avatar Jan 23 '20 01:01 hashbackup

Here's a sketch of what I had in mind for an interface file that would allow Seq to call C extensions already in the Python interpreter. For example, the struct built-in located in Modules/_struct.c

Disclaimer: I'm not an expert on Python internals and still using 2.7 w/o types, so this could all be complete rubbish!

Python 2.7.15 (default, Aug 17 2019, 19:47:17) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import struct
>>> type(struct)
<type 'module'>
>>> dir(struct)
['Struct', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '_clearcache', 'calcsize', 'error', 'pack', 'pack_into', 'unpack', 'unpack_from']
>>> type(struct.Struct)
<type 'type'>
>>> type(struct.calcsize)
<type 'builtin_function_or_method'>
>>> s=struct.Struct()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Required argument 'format' (pos 1) not found
>>> s=struct.Struct(format='i')
>>> type(s)
<type 'Struct'>
>>> dir(s)
['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'format', 'pack', 'pack_into', 'size', 'unpack', 'unpack_from']

An interface file struct.pyi might look something like:

exception error

def pack(in fmt: str, in ...) -> str

def pack_into(in fmt: str, out buf: str, in offset: int, in ...) -> None

def unpack(in fmt: str, in s: str) -> Tuple[...] or (...)

def unpack_from(in fmt: str, in s: str, in offset: int = 0) -> None

def calcsize(in fmt: str) -> int

class Struct:

  str format  (or maybe format: str)

  int sizebytes  (or maybe sizebytes: int)
  
  def __init__(in fmt: str) -> Struct
 
  def pack(in ...) -> str

  def pack_into(out buf: str, in offset: int, in ...) -> None

  def unpack(in s: str) -> Tuple[...] or (...)
  
  def unpack_from(in s: str, in offset: int = 0) -> None

At compile time, it would probably be a good idea for the Seq compiler to verify that the module names and class members are correct, but I think that's about all it can do to check the .pyi interface file. If it makes sense, Seq could write out a compiled .pyj file to avoid having to parse the interface again, sort of like .pyc files.

hashbackup avatar Jan 25 '20 03:01 hashbackup