sublime-parquet
sublime-parquet copied to clipboard
Proposal to use python libs instead of parquet tools
It is not convenient to install Java-based parquet tools. There is at least one python lib for work with parquet pyarrow. There are some advantages to use this lib:
- no need to install Java and parquet-tools
- possibility of editing parquet files
What do you think?
I agree that it's a good idea to use a python lib to read in parquet files. Editing parquet files might be a bit inefficient using any text editor.
@yuj would you be interested in taking a PR to accomplish this? I tried to do this separately as a fork (https://github.com/dogversioning/sublime-parquet-python), which changes the rendering options (the python tools I used as a first pass don't support JSON output), but the sublime text folks have a light preference to consolidate these approaches if possible.
@dogversioning PRs are always welcome! Please send it over.
Eventually I guess we all still prefer @pokidovea suggestion that uses pyarrow
to read parquet files, instead of using parquet-tools
. Anyone interested in accomplish that too? :)
@yuj yeah, i think it makes sense - this was more of an incremental approach to solve an acute issue, but something like that was next on my list of things to potentially tackle.
Anyway, give me a bit to reconcile the fork approach with a in place one and i'll open a PR.
@yuj So I spent a little time this morning looking into this - there's some tradeoffs:
- The two big parquet libs (fastparquet & pyarrow) require numpy, which is not available in python 3.3, so you'd have to run in python 3.8, which is only available in ST4 and later.
- Dependency management is going to be an issue. Since neither of these are not bundled for sublime text, there are two possible pathways [1]:
- Distributing pre-built dependencies per platform. Both of these have complex build chains touching libs requiring C++/Cython access, which would be a large engineering effort.
- Asking a user to download python 3.8 and install a dependency, and then move/link it directly into Sublime Text's
Lib
folder.
If the first one doesn't bother you and you're ok with the hoops on the latter (I think for something of this scope the pre-built route isn't worth the effort), than it :could: be done. But it's an open question if this makes the barrier to entry too complex.
[1] https://stackoverflow.com/questions/61196270/how-to-properly-use-3rd-party-dependencies-with-sublime-text-plugins