tch-rs icon indicating copy to clipboard operation
tch-rs copied to clipboard

tch-based Python module?

Open cdfox opened this issue 4 years ago • 23 comments

Do you know of any examples of Python modules written in Rust using tch? I'm interested in implementing a custom RNN cell in Rust using tch and exposing it to be used in a PyTorch program.

cdfox avatar Apr 18 '20 15:04 cdfox

I am intrigued. Why using a python wrapper for tch and not pytorch directly?

vegapit avatar Apr 18 '20 16:04 vegapit

An RNN cell is going to involve a for loop. When you implement it in Python, you end up with a bit too much work being done in Python, and inference speed for longer sequences on cpu can benefit quite a bit from moving the cell implementation to, say, C++. I'm thinking a similar speedup could be achieved by implementing the cell in Rust.

Edit: I think a good starting place might be the examples here: https://github.com/PyO3/pyo3#examples.

cdfox avatar Apr 18 '20 16:04 cdfox

I understand.

The ideal setup would be to train the model in a high perf language like Rust, and export it for usage to a scripting language like Python. Unfortunately, the tch library did not have a model export functionality last time I checked.

A python wrapper idea could be a good compromise if it works. I will give it a try and share if I get somewhere. Thanks

vegapit avatar Apr 18 '20 16:04 vegapit

I don't know of any such example, and that indeed seems like a good use case. The C++ api has a tutorial about this [Custom C++ and CUDA Extensions], it would be very nice to write a Rust version of this. As you noted PyO3 is likely to be a good point to start (tch-rs already uses the cpython crate for interfacing with a Python runtime for reinforcement learning examples).

The ideal setup would be to train the model in a high perf language like Rust, and export it for usage to a scripting language like Python. Unfortunately, the tch library did not have a model export functionality last time I checked.

Funnily I'm actually using the opposite setup: I experiment with models and train them in python as it's more flexible to play with and the heavy lifting takes place on the gpu anyway. When it comes to productionizing/deploying models, I use rust as I find it much better to build large robust systems.

LaurentMazare avatar Apr 19 '20 06:04 LaurentMazare

Funnily I'm actually using the opposite setup: I experiment with models and train them in python as it's more flexible to play with and the heavy lifting takes place on the gpu anyway. When it comes to productionizing/deploying models, I use rust as I find it much better to build large robust systems.

I think we have had this debate when I was enquiring on how to export a JIT model with tch =;]

Using a tch model with cpython seems to work. Here is an example of exported function:

fn tch_train(_py: Python, xs: Vec<Vec<f64>>, ys: Vec<f64>) -> PyResult<f64> {
    let mut loss = 1f64;
    let vs = nn::VarStore::new( tch::Device::Cpu );
    let model = nn::seq().add( nn::linear(&vs.root(), 5, 1, Default::default()) );
    let mut optim = RmsProp::default().build(&vs, 0.001).unwrap();
    while loss > 1e-4 {
        for (x,y) in xs.iter().zip( ys.iter() ) {
            let t_x = Tensor::of_slice( &x.clone().as_slice() ).to_kind( Kind::Float ).unsqueeze(0);
            let t_y = Tensor::of_slice( &[y.clone()]).to_kind( Kind::Float ).unsqueeze(0);
            let t_out = model.forward( &t_x ).squeeze();
            let t_loss = (t_y - t_out).pow(2f64).sum( Kind::Float );
            optim.backward_step( &t_loss );
            loss = f64::from( t_loss );
        }
    }
    Ok(loss)
}

and the Python code I ran for testing:

import mymodule
import numpy as np

def my_func(x):
    return np.sum( x * np.array([5.0,-4.0,3.0,-2.0,1.0]))

xs = np.random.random(100).reshape((20,5)).tolist()
ys = np.apply_along_axis(my_func,1,xs).tolist()
print( mymodule.tch_train(xs,ys) )

The easy way to proceed is to have one wrapped function that trains the tch model and saves it to disk, and another that loads the model from disk and runs the estimation. It is obviously not ideal if estimations are requested at high frequency. Creating a PyClass that encapsulates the model could be the more optimal solution but I would not bet it would work.

vegapit avatar Apr 19 '20 09:04 vegapit

That's a pretty cool example. Compared to the tutorial on developing C++ extensions (which is very close to what @cdfox would like to achieve: writing a custom RNN cell in an optimized language and use it in python), the main missing bit is probably having a proper integration with tensors rather than going through numpy arrays. This is especially likely to be an issue on gpu as the data would have to move back and forth between the main memory and the gpu memory. Ideally you would want the exposed rust function to take as input/return pytorch tensors.

LaurentMazare avatar Apr 19 '20 11:04 LaurentMazare

In this example, I have only used Numpy arrays for brevity of code. In a couple of lines, I could generate random numbers, arrange them in the right shape and calculate the output vector. Ultimately, what is passed to the wrapper function are Python Lists as they are seamlessly converted to Rust Vecs by cypthon.

I do not quite understand the dynamics in the GPU setting as I have never really used CUDA.

vegapit avatar Apr 19 '20 13:04 vegapit

Just to clarify on my use case, my first goal would be to get a speedup via Rust for inference on cpu. A speedup on training would be a bonus. Sounds like there is a pathway for passing a numpy in python into a function written in Rust, where it's available as a PyArrayDyn, for example: Python: https://github.com/PyO3/rust-numpy/blob/master/examples/simple-extension/README.md Rust: https://github.com/PyO3/rust-numpy/blob/master/examples/simple-extension/src/lib.rs But maybe there's not a way to pass a PyTorch tensor into a Rust function (even just in main memory). I believe conversion from NumPy arrays to PyTorch tensors is pretty low overhead (https://discuss.pytorch.org/t/what-is-the-overhead-of-transforming-numpy-to-torch-and-vice-versa/7395) in Python at least. I'm not sure about passing NumPy array into Rust function, then in Rust converting to Tensor.

cdfox avatar Apr 19 '20 14:04 cdfox

Right, so in summary, the aim of this exercise is not only to access Tch models from Python, but it is also about limiting type casting during the data transfer. In that case, Pytorch Tensor from/to Tch Tensor conversions would indeed be the most appropriate. Intuitively if they are similar at memory level, there should be an "unsafe" way of getting it done.

vegapit avatar Apr 19 '20 15:04 vegapit

I understand the motivation of speeding up the loop using Rust rather than a pure Python implementation, but I am unsure this is the most effective way to achieve greater speed. This discussion indicates that a pure translation from Python to a high performance language would result in speed gains of about ~10%. It points to a useful article that is most likely to offer more significant benefits, especially using fusing.

The cost of a loop over a sequence of say, 100 elements, is relatively low compared to the RNN operations within each iteration (especially for complex LSTM or GRU-like units).

guillaume-be avatar Apr 24 '20 15:04 guillaume-be

I actually have another use case here where I want to efficiently pass PyTorch tensor's between Python and Rust.

With AllenNLP, one of our main performance drags is data loading, especially when the dataset is too big to fit in memory so that you have to lazily load it on the fly. If you can't load data fast enough, you can't keep the GPUs occupied.

So we've been playing around with the idea of writing data loaders in Rust that would pass off tensors to Python.

I would love to hear if anyone's gotten any further with this.

epwalsh avatar Jun 12 '20 22:06 epwalsh

Supporting writing PyTorch extension in Rust will be extremely useful, especially if you want to do more complicated operations (just like the motivation of PyTorch official Cpp/CUDA extension tutorial). However currently pyo3 does not recognize tch tensor type, maybe we can start from adding tensor support for pyo3?

awaited-hare avatar Jun 13 '20 00:06 awaited-hare

@LaurentMazare Any thoughts on this? I'd like to help if needed.

awaited-hare avatar Jun 24 '20 20:06 awaited-hare

Sorry for being slow to come back, a decent first step would indeed be to be able to get from the python tensor object the underlying pointer, pass it through pyo3 so that we could build the same tensor on the rust side. I guess most of the work would be in understanding how the python api wrap things, I may look at it when I find some time but it's unlikely to happen in the next couple weeks or so.

LaurentMazare avatar Jun 26 '20 11:06 LaurentMazare

I've started trying to do this for my own project and ran into an exit code 139. I used this code to create a function in PyO3:

use pyo3::prelude::*;
use pyo3::{wrap_pyfunction, AsPyPointer, PyNativeType};
extern crate tch;
use tch::Tensor;
use torch_sys::*;

#[pyfunction]
fn loss_for_neighbors(x: &PyAny) -> PyResult<()> {
    // return Ok((2 * x) as PyAny);
    let ct: *mut C_tensor = x.as_ptr() as *mut C_tensor;
    println!("got past ct");
    // let t = Tensor::from_py(x.as_ptr().into_py());
    let t = Tensor::from_ptr(ct);
    println!("got t");
    println!("{}", t.dim());
    Ok(())
}

#[pymodule]
fn geodb(py: Python, m: &PyModule) -> PyResult<()> {
    m.add_wrapped(wrap_pyfunction!(loss_for_neighbors))?;
    Ok(())
}

I added a from_ptr function to the Tensor class in the tensor.rs file of tch-rs:

impl Tensor {
    /// Creates a new tensor.
    pub fn new() -> Tensor {
        let c_tensor = unsafe_torch!(at_new_tensor());
        Tensor { c_tensor }
    }

    pub fn from_ptr(c_tensor: *mut C_tensor) -> Tensor {
        Tensor { c_tensor }
    }
...

But I got this error:

import geodb import torch t = torch.zeros((2,)) geodb.loss_for_neighbors(t)

got past ct got t Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

jogardi avatar Jun 29 '20 23:06 jogardi

I haven't investigated this very deeply so I may be way off but I think you're passing a pointer to the python tensor object and try to interpret it as a C tensor pointer which results in a segfault. Instead I think you should probably try to pass a pointer to the tensor data, e.g. on the python side pass t.data_ptr() instead of t. This int should be interpreted as a pointer, and you would want to create the tensor from this storage. You could then try using Tensor.of_data_size on this. Passing the tensor dimensions + element types would help, although when prototyping things, hardcoding these would be fine.

LaurentMazare avatar Jun 30 '20 08:06 LaurentMazare

I haven't investigated this very deeply so I may be way off but I think you're passing a pointer to the python tensor object and try to interpret it as a C tensor pointer which results in a segfault. Instead I think you should probably try to pass a pointer to the tensor data, e.g. on the python side pass t.data_ptr() instead of t. This int should be interpreted as a pointer, and you would want to create the tensor from this storage. You could then try using Tensor.of_data_size on this. Passing the tensor dimensions + element types would help, although when prototyping things, hardcoding these would be fine.

Wouldn't that copy the array but not include the gradient information and loose track computation graph? I would like this to work with autograd but that might be difficult if we can't directly wrap the same C++ object being used by python.

jogardi avatar Jun 30 '20 20:06 jogardi

I'm not sure about the gradient as I haven't thought about this deeply but I feel that it would be a first proof of concept. If this ends up not segfaulting, it will be possible to build up from there.

LaurentMazare avatar Jun 30 '20 20:06 LaurentMazare

HI, @jogardi @L1AN0 . I made a proof-of-concept project transferring tch::Tensor to Python in dlpack by using pyo3 and vice versa. I hope it can be helpful.

https://github.com/SunDoge/tch-to-pytorch-poc/blob/master/src/lib.rs

SunDoge avatar Aug 08 '20 02:08 SunDoge

@LaurentMazare Maybe we can create a user guide on how to create Python module with tch based on what @SunDoge has done?

awaited-hare avatar Sep 29 '20 13:09 awaited-hare

I'd also be very excited to see this happen. @SunDoge do gradients associated with the tensors pass between Python and Rust in your prototype?

Ejhfast avatar Jul 19 '21 20:07 Ejhfast

@Ejhfast nope, but it's possible. Tensor.grad is also a tensor and can be passed in the same way.

SunDoge avatar Jul 20 '21 06:07 SunDoge

Hi, I had the same need for python - tch interface. For that, I have set up a small proof of concept which uses the same functions as torch operators.

It requires linking pytorch_python library as the functions are located in python_variable.h.

As I understood THPVariable_Wrap is used to wrap the tensor and transfer ownership to python. While THPVariable_Unpack gives you the pointer to the tensor object itself.

The proof of concept is a bit of hack right now: Library changes: https://github.com/EgorDm/tch-rs/commit/5e5fd752804e0e83f57ef30b656063c66d92a950 Demo: https://github.com/EgorDm/tch-rs/commit/9c3b32ec8c2986c6012eb7c4693397a0e579f834

I would like to build a more polished version and open a pr if there is interest in such a thing. Thanks

egordm avatar Feb 19 '22 11:02 egordm

We've added a new pyo3-tch crate to make it easier to write such tch based Python modules. You can find an example using this in the tch-ext repo (this is a work in progress so the api is likely to change in the future).

LaurentMazare avatar Jun 17 '23 19:06 LaurentMazare