patsy icon indicating copy to clipboard operation
patsy copied to clipboard

DesignMatrix should have to_dataframe() method

Open shoyer opened this issue 12 years ago • 4 comments

This would be useful, for example, when I really want to be able to use a design matrix as both a raw numpy array and a pandas dataframe.

I suppose I could specify return_type="dataframe" and then get the numpy array from df.values, and it's also not hard to build the dataframe from scratch, but this would be particularly handy for interactive use, where it would provide a useful shortcut (e.g., X.to_dataframe().plot() or X.to_dataframe().head()).

To do this right, the new method would be factored out of build_design_matrices. Roughly speaking, it would look like this:

def to_dataframe(self):
    if not have_pandas:
        raise PatsyError("pandas.DataFrame was requested, but "
                         "pandas is not installed")
    di = self.design_info
    df = pandas.DataFrame(self, columns=di.column_names,
                          index=di.pandas_index)
    df.design_info = di
    return df

The main design change would be that DesignInfo (or DesignMatrix) would need to gain a pandas_index attribute, which would keep track of any index from the original data.

If this seems reasonable, I could put together a pull request.

shoyer avatar Oct 30 '13 06:10 shoyer

In principal, I agree with the sentiment. I'm not sure I agree with the design you've proposed, but if you hand off a pandas object to patsy, I think it should be trivial to get one back at some point even if you don't specify return_type="dataframe". AFAICT, this isn't possible right now.

jseabold avatar May 06 '14 14:05 jseabold

I also think something like this might be useful for keeping track of pandas metadata for future use.

kyleabeauchamp avatar Jan 11 '15 20:01 kyleabeauchamp

Sorry for missing this. Seems reasonable to me.

njsmith avatar Apr 14 '15 22:04 njsmith