joinery icon indicating copy to clipboard operation
joinery copied to clipboard

Coercion methods

Open wbuchanan opened this issue 9 years ago • 2 comments

More of a question/potential enhancement request than anything. Basically, I was just wondering what it would take to create methods to coerce existing objects into a DataFrame object? I would imagine 2d Arrays would be fairly easy to handle (although I could be completely wrong). My hope was that as I get some other work wrapped up on some readers/parsers for Stata formatted files (as well as others in the future) it'd be possible to build the classes/methods around an idea of being able to coerce the data into a DataFrame (then there'd be the advantage of joins/unions of files from different statistical software platforms). Also, I haven't looked too much into the documentation yet, but if there is a way to retain any metadata with the file that would be helpful as well (e.g., variable labels (distinct from column names), value labels (e.g., analogous to descriptions in a look up table in a SQL database), etc...).

wbuchanan avatar Jan 23 '16 14:01 wbuchanan

There are currently methods to read and write csv and Excel files, generally these provide the interoperability I need. That said, I release they are rather low fidelity (i.e. they preserve column names but not much else). There are also methods to convert to 2d arrays, but not from. I think this would be a useful addition. Also, reading and writing other formats would be useful as well. I can take a look at adding these features or will gladly merge a pull request.

Variable labels might be a little more difficult, Joinery doesn't currently store any additional information about the individual data points. While this certainly could be added, it isn't as high a priority for me personally. But again, pull requests are welcome.

cardillo avatar Jan 23 '16 17:01 cardillo

The only working example I would have at the moment is some work I did on serializing data in memory to a JSON object using Stata's Java API https://github.com/wbuchanan/StataJSON. I've broken some of the work there into more generic classes here as well as trying to potentially test coercing some of the data to a DataFrame. There is a C library the could be helpful for parsing files from statistical packages, but I'm not terribly familiar with JNI or how the C library is working (https://github.com/WizardMac/ReadStat). I think once I can figure out how to get the data into a DataFrame object I could probably figure out how to get it into an object suitable for Stata.

wbuchanan avatar Jan 23 '16 17:01 wbuchanan