Add support for Arrow tables
This is more a feature request than a bug. Sorry if this is not the right channel.
Describe the bug
I would like to convert fit objects directly to PyArrow tables, without constructing an intermediate Pandas dataframe. I have a working implementation.
Describe your system
Linux archlinux 6.16.7-arch1-1
x86-64
g++ (GCC) 15.2.1 20250813
mamba 1.5.8
conda 24.3.0
Steps/Code to Reproduce
Code Sample, a copy-pastable example
fit.to_arrow()
It's not currently implemented, should return a PyArrow Table. I have an implementation with working tests.
Hi, @mvlvrd and thanks so much for joining the Stan developer discussion. This is the right channel, but I'm not entirely sure how much PyStan is being maintained and people are going to be able to handle PRs. Our current reference implementation of Stan is CmdStanPy. So I'm curious how much of what you did could be ported to that system (pinging @WardBrian and @mitzimorris, who are supporting CmdStanPy).
If you could write out what the function and its basic documentation would look like, that would be a huge help. And is it just a dependency on pyarrow? So I assume this only brings in the Apache v2 license dependency? We decided we were OK with Apache v2 as a project, so that shouldn't be a problem
Yes, as @bob-carpenter says, pystan (+ httpstan) are currently not under development. I recommend trying CmdStanPy.
There are multiple reasons for this, but mostly that the stan + C++ compilation process is quite complex, and python infrastructure is not really designed to handle this kind of dynamically compiled C++ code.
Thanks, @bob-carpenter and @ahartikainen I should have checked the previous issues before opening this one, sorry. The code is very simple and only needs pyarrow.
+ def to_arrow(self):
+ """Return view of draws as an Arrow Table.
+
+ If pyarrow are not installed, a `RuntimeError` will be raised.
+
+ Returns:
+ pyarrow.Table: Table with `num_draws` rows and
+ `num_flat_params` columns.
+ """
+ try:
+ import pyarrow as pa
+ except ImportError:
+ raise RuntimeError("The `to_arrow` method requires the Python package `pyarrow`.")
+ columns = self.sample_and_sampler_param_names + self.constrained_param_names
+ assert len(self._draws) == len(columns)
+ table = pa.Table.from_arrays(self._draws.reshape(len(columns), -1), names=columns)
+ return table
In any case, I will try cmdstanpy and see how this fits there. Thank you so much.