core icon indicating copy to clipboard operation
core copied to clipboard

[DO NOT MERGE] Python binding of Triton server C API

Open GuanLuo opened this issue 1 year ago • 0 comments

This PR provides the low level binding for Python user to interact with Triton library within the same process, however, this binding is not intended for Python user to use directly but to enable the development of the Python wrapper. Python user should use the wrapper mentioned because it encapsualtes many concepts that are not intuitive to Python user but inevitably leaked into the binding.

Most part of the binding is straight forward mapping of the C API, however, the PyResponseAllocator, PyTrace, PyInferenceRequest and PyInferenceResponse requires extra attention because they are objects that involves callbacks and ownership transfer. Also, below is the snippet from .cc file to help reviewers navigate things that are different from the C equivalent of the binding.

// This binding is merely used to map Triton C API into Python equivalent,
// and therefore, the naming will be the same as the one used in corresponding
// sections. However, there are a few exceptions to better transit to Python:
// Structs:
//  * Triton structs are encapsulated in a thin wrapper to isolate raw pointer
//    operations which is not supported in pure Python. A thin 'PyWrapper' base
//    class is defined with common utilities
//  * Trival getters and setters are grouped to be a Python class property.
//    However, this creates asymmetry that some APIs are called like function
//    while some like member variables. So I am open to expose getter / setter
//    if it may be more intuitive.
//  * The wrapper is only served as communication between Python and C, it will
//    be unwrapped when control reaches C API and the C struct will be wrapped
//    when control reaches Python side. Python binding user should respect the
//    "ownership" and lifetime of the wrapper in the same way as described in
//    the C API. Python binding user must not assume the same C struct will
//    always be referred through the same wrapper object.
// Enums:
//  * In C API, the enum values are prefixed by the enum name. The Python
//    equivalent is an enum class and thus the prefix is removed to avoid
//    duplication, i.e. Python user may specify a value by
//    'TRITONSERVER_ResponseCompleteFlag.FINAL'.
// Functions / Callbacks:
//  * Output parameters are converted to return value. APIs that return an error
//    will be thrown as an exception. The same applies to callbacks.
//  ** Note that in the C API, the inference response may carry an error object
//     that represent an inference failure. The equivalent Python API will raise
//     the corresponding exception if the response contains error object.
//  * The function parameters and return values are exposed in Python style,
//    for example, object pointer becomes py::object, C array and length
//    condenses into Python array.

GuanLuo avatar Aug 28 '23 21:08 GuanLuo