vscode-data-preview icon indicating copy to clipboard operation
vscode-data-preview copied to clipboard

Support `.feather` file extension for arrow format

Open bloodearnest opened this issue 3 years ago • 12 comments

Some tools, notably R, use .feather as the file extension for arrow data, as that's the actual name of the serialization format of arrow data.

E.g. https://blog.rstudio.com/2016/03/29/feather/

Data Preview works fine if you rename the file extension from .feather to .arrow, but it would be good to support it OOTB.

bloodearnest avatar Jun 03 '21 14:06 bloodearnest

there is more you'd have to change for that. see some of my closed enhancements tickets and commit history when I was adding other data file types support.

Thanks for the suggestion and first attempt tho :slightly_smiling_face:

RandomFractals avatar Jun 03 '21 14:06 RandomFractals

I had a look at some of those issues, particularly #2 and #12, and noticed there's a few additional regexes that needed updating also, so have had done that, as well as updating the README.md docs.

My intent is not to create a new file type (i.e. arrow === feather), but wanted to have the menu options/keyboard shortcuts work for a .feather file exactly like they do for a .arrow file.

I couldn't figure out how to run the tests locally. Let me know if there's anything more that's needed.

bloodearnest avatar Jun 03 '21 19:06 bloodearnest

you also need to update data file ext. switches in data.view.ts , data.view.html and data.view.js. see other commits in #12 for example of what's involved. The other package.json and docs update changes look good so far. Thanks!

Also, I've seen other dev groups use .arrow file name extension. We can add feather. I believe that was old name for that framework and prototype.

I would rather avoid removing .arrow data files support at this stage, considering there are tens of thousands of devs using this data preview extension now, and removing that would break their test arrow data files, etc.

RandomFractals avatar Jun 03 '21 19:06 RandomFractals

Thanks for the pointers, will take a look.

Had no intention to remove .arrow file extension support at all, just to add .feather - I didn't think I'd done that?

I've seen people use .arrow and .feather interchangeably, agree that .arrow should still be supported.

Arrow is the in-memory layout - feather is the default serialisation of that to disk. The author of feather talks a bit about it here: https://wesmckinney.com/blog/feather-and-apache-arrow/. Feather V2 came out in 2020: https://ursalabs.org/blog/2020-feather-v2/

AIUI, .arrow files are technically feather files, just some groups name them .arrow and some .feather. My goal is to support both in Data Preview as the same thing.

bloodearnest avatar Jun 03 '21 19:06 bloodearnest

yeah, I stopped reading his posts at some point. I still like that format a lot.

Appreciate you scrubbing in for this. If you don't mind I'll let you poke around more and see how far you can get to add .feather files support.

For building and testing it, you just need to install code, run npm install build and F5 in vscode to debug udpated extension. See last section in my readme.md.

We probably should update arrow js after you are done with those chages. I know the version I last integrated is about a year old.

Ping me here or DM on twitter if you have some questions how to wire the rest: https://twitter.com/TarasNovak

And many thanks for checking it out and going for a file type update.

RandomFractals avatar Jun 03 '21 20:06 RandomFractals

Would love this feature, what's left to complete this @RandomFractals @bloodearnest? Would also be great to support '.ftr'

abekfenn avatar Nov 18 '21 19:11 abekfenn

@abekfenn I'll check it out next weekend. so far those changes looked close enough.

RandomFractals avatar Nov 29 '21 22:11 RandomFractals

btw, this is on hold as I started working on new tabular data viewer extension that will provide better support for large arrow data files soon: https://github.com/RandomFractals/tabular-data-viewer

RandomFractals avatar Dec 13 '21 16:12 RandomFractals

Sounds great, would love to know when that's ready. I'll keep an eye out

abekfenn avatar Dec 13 '21 19:12 abekfenn

@abekfenn I am hoping to wrap up that extension MVP with latest arrow data bindings support this month.

RandomFractals avatar Dec 13 '21 20:12 RandomFractals

Hi @RandomFractals how's implementation of arrow data bindings coming along? Would love to check it out if it's ready.

abekfenn avatar Mar 08 '22 16:03 abekfenn

@abekfenn arrow data has been supported by this extension before anyone even knew what that was.

There are no new updates on that front here. Thanks for your interest!

RandomFractals avatar Mar 09 '22 17:03 RandomFractals

@bloodearnest All the online data examples I've seen recently, including arrow data support in new duckdb use .arrow file name extension.

I'd rather not complicate things and stick to one file extension name per data format.

So, I am going to close this request as I believe .feather is a dated file naming for the arrow data files.

RandomFractals avatar Aug 25 '22 12:08 RandomFractals