sublime-parquet icon indicating copy to clipboard operation
sublime-parquet copied to clipboard

Proposal to use python libs instead of parquet tools

Open pokidovea opened this issue 5 years ago • 5 comments

It is not convenient to install Java-based parquet tools. There is at least one python lib for work with parquet pyarrow. There are some advantages to use this lib:

  • no need to install Java and parquet-tools
  • possibility of editing parquet files

What do you think?

pokidovea avatar Apr 25 '19 06:04 pokidovea

I agree that it's a good idea to use a python lib to read in parquet files. Editing parquet files might be a bit inefficient using any text editor.

yuj avatar Jun 14 '22 02:06 yuj

@yuj would you be interested in taking a PR to accomplish this? I tried to do this separately as a fork (https://github.com/dogversioning/sublime-parquet-python), which changes the rendering options (the python tools I used as a first pass don't support JSON output), but the sublime text folks have a light preference to consolidate these approaches if possible.

dogversioning avatar Dec 03 '22 13:12 dogversioning

@dogversioning PRs are always welcome! Please send it over.

Eventually I guess we all still prefer @pokidovea suggestion that uses pyarrow to read parquet files, instead of using parquet-tools. Anyone interested in accomplish that too? :)

yuj avatar Dec 03 '22 18:12 yuj

@yuj yeah, i think it makes sense - this was more of an incremental approach to solve an acute issue, but something like that was next on my list of things to potentially tackle.

Anyway, give me a bit to reconcile the fork approach with a in place one and i'll open a PR.

dogversioning avatar Dec 03 '22 18:12 dogversioning

@yuj So I spent a little time this morning looking into this - there's some tradeoffs:

  • The two big parquet libs (fastparquet & pyarrow) require numpy, which is not available in python 3.3, so you'd have to run in python 3.8, which is only available in ST4 and later.
  • Dependency management is going to be an issue. Since neither of these are not bundled for sublime text, there are two possible pathways [1]:
    • Distributing pre-built dependencies per platform. Both of these have complex build chains touching libs requiring C++/Cython access, which would be a large engineering effort.
    • Asking a user to download python 3.8 and install a dependency, and then move/link it directly into Sublime Text's Lib folder.

If the first one doesn't bother you and you're ok with the hoops on the latter (I think for something of this scope the pre-built route isn't worth the effort), than it :could: be done. But it's an open question if this makes the barrier to entry too complex.

[1] https://stackoverflow.com/questions/61196270/how-to-properly-use-3rd-party-dependencies-with-sublime-text-plugins

dogversioning avatar Jan 01 '23 17:01 dogversioning