vscode-data-preview
vscode-data-preview copied to clipboard
Support `.feather` file extension for arrow format
Some tools, notably R, use .feather as the file extension for arrow data, as that's the actual name of the serialization format of arrow data.
E.g. https://blog.rstudio.com/2016/03/29/feather/
Data Preview works fine if you rename the file extension from .feather to .arrow, but it would be good to support it OOTB.
there is more you'd have to change for that. see some of my closed enhancements tickets and commit history when I was adding other data file types support.
Thanks for the suggestion and first attempt tho :slightly_smiling_face:
I had a look at some of those issues, particularly #2 and #12, and noticed there's a few additional regexes that needed updating also, so have had done that, as well as updating the README.md docs.
My intent is not to create a new file type (i.e. arrow === feather), but wanted to have the menu options/keyboard shortcuts work for a .feather
file exactly like they do for a .arrow
file.
I couldn't figure out how to run the tests locally. Let me know if there's anything more that's needed.
you also need to update data file ext. switches in data.view.ts , data.view.html and data.view.js. see other commits in #12 for example of what's involved. The other package.json and docs update changes look good so far. Thanks!
Also, I've seen other dev groups use .arrow file name extension. We can add feather. I believe that was old name for that framework and prototype.
I would rather avoid removing .arrow data files support at this stage, considering there are tens of thousands of devs using this data preview extension now, and removing that would break their test arrow data files, etc.
Thanks for the pointers, will take a look.
Had no intention to remove .arrow
file extension support at all, just to add .feather
- I didn't think I'd done that?
I've seen people use .arrow
and .feather
interchangeably, agree that .arrow
should still be supported.
Arrow is the in-memory layout - feather is the default serialisation of that to disk. The author of feather talks a bit about it here: https://wesmckinney.com/blog/feather-and-apache-arrow/. Feather V2 came out in 2020: https://ursalabs.org/blog/2020-feather-v2/
AIUI, .arrow
files are technically feather files, just some groups name them .arrow
and some .feather
. My goal is to support both in Data Preview as the same thing.
yeah, I stopped reading his posts at some point. I still like that format a lot.
Appreciate you scrubbing in for this. If you don't mind I'll let you poke around more and see how far you can get to add .feather files support.
For building and testing it, you just need to install code, run npm install build and F5 in vscode to debug udpated extension. See last section in my readme.md.
We probably should update arrow js after you are done with those chages. I know the version I last integrated is about a year old.
Ping me here or DM on twitter if you have some questions how to wire the rest: https://twitter.com/TarasNovak
And many thanks for checking it out and going for a file type update.
Would love this feature, what's left to complete this @RandomFractals @bloodearnest? Would also be great to support '.ftr'
@abekfenn I'll check it out next weekend. so far those changes looked close enough.
btw, this is on hold as I started working on new tabular data viewer extension that will provide better support for large arrow data files soon: https://github.com/RandomFractals/tabular-data-viewer
Sounds great, would love to know when that's ready. I'll keep an eye out
@abekfenn I am hoping to wrap up that extension MVP with latest arrow data bindings support this month.
Hi @RandomFractals how's implementation of arrow data bindings coming along? Would love to check it out if it's ready.
@abekfenn arrow data has been supported by this extension before anyone even knew what that was.
There are no new updates on that front here. Thanks for your interest!
@bloodearnest All the online data examples I've seen recently, including arrow data support in new duckdb use .arrow
file name extension.
I'd rather not complicate things and stick to one file extension name per data format.
So, I am going to close this request as I believe .feather
is a dated file naming for the arrow data files.