prql icon indicating copy to clipboard operation
prql copied to clipboard

Merge prql-tool

Open aljazerzen opened this issue 1 year ago • 3 comments

@snth has created a CLI tool that utilizes PRQL queries to read data from various sources and write them as CSV or parquet files.

Compared to prql-compiler CLI, it is actually useful, which is why I think it should be the binary we are promoting to users to try out.

But I would suggest a name change, because PRQL is the language and it may be confusing to use prql my_query.prql. I suggest the name pipe.

TODO:

  • [ ] merge snth's fork (or split it into prql/pipe repo to avoid long compilation times)
  • [ ] setup CI/CD
  • [ ] change homebrew recipe to install pipe instead of prql-compiler
  • [ ] change website to promote pipe

aljazerzen avatar Sep 19 '22 06:09 aljazerzen

Thanks @aljazerzen . I was thinking whether you might want to split it into a separate repo, also to avoid mixing of issues relating to the tool with issues relating to the language.

Regarding the name of the binary, I know it was a bit cheeky to go for prql but I thought it would be good marketing for PRQL the language. I think pipe is too generic and would prefer to keep prql in the name somehow, like prqls, prqlt or prqly (the last one just because it sounds like "prickly" (we'd have backport a meaning for it)). My first choice would be to just stick with prql though.

snth avatar Sep 19 '22 21:09 snth

I definitely think it's worth getting prql-tool out there — it is already very cool!

I also agree that the current prql binary could be named prqlc — maybe @aljazerzen had mentioned it in conversation since I can't find an issue. I can open one.

I think we can see how prql-tool matures, and if it matures successfully then we could anoint it as prql. I am definitely fine having it in the repo now, but up to you @snth.

Even if someone only cared about the language being adopted by other tools, having something like prql-tool lets people see what's possible on their own data immediately (my perspective has probably matured on this over the past six months...)

max-sixty avatar Sep 20 '22 02:09 max-sixty

I will submit a PR as soon I've completed the current batch of work which is to unify the output formats for the different backends. These used to be backend specific so the tables would look different on stdout and the parquet files would be named differently or not available on some backends. I'm just prototyping on the DataFusion backend for now but hopefully I can port that code to the other backends as well. Unfortunately that's where the arrow version incompatibilities come in. I'm hoping I can port the source code even if the materialised binary code will be backend arrow version specific.

At that point I think it will be worth getting into the hands of you folks for some testing as I realise that at the moment I have no CI integration or testing set up. I'll continue working on adding the functionality I still have planned in parallel to that. I just don't want to release it just yet as it currently no longer does what it says in the README while I ripped out the output formatting code for unification.

Regarding the name, I think prql-tool is too long for cli binary. How about prqlt analogous to prqlc? You can also "verb" that like "I just prqlt some of our PROD data to a CSV file for you to do some testing on." 😉

snth avatar Sep 22 '22 10:09 snth

Hi, I've been thinking about the name more this weekend and I want to rename this to prql-query and the binary to pq. I didn't see any obvious clashes for another pq tool.

snth avatar Sep 26 '22 10:09 snth

Great!

Re pq — any concerns about a parquet implication?

max-sixty avatar Sep 26 '22 17:09 max-sixty

Nah, pq is great! Concise and similar to the well known jq.

aljazerzen avatar Sep 26 '22 17:09 aljazerzen

Cool. The mention of jq in that recent HN columnq-cli thread is what might have actually given me the idea.


@max-sixty Do you have a particular concern around parquet? I wasn't aware of anything but after some more googling I did find:

import pyarrow.parquet as pq

as a fairly common idiom but that is in Python so it's not a direct clash. I see parquet as a major use case for this so some allusions to that I don't see as a bad thing as long as there's no direct clash.


In Ubuntu there's already a package called pq but that seems to be a game called Progress Quest so presumably that's not hugely popular and a major clash at the shell for our target market.


The only other usage of PQ I'm aware of is for PowerQuery in the Excel and Power BI world. I think that's also distinct enough though that I think we should be ok.

snth avatar Sep 26 '22 19:09 snth

Great — it reminded me of parquet, but not in some "don't call it py" obvious way — I would go ahead!

max-sixty avatar Sep 26 '22 22:09 max-sixty

Do you want this to be in a prql-query subdirectory / workspace or should we maybe put this in a separate repo under the prql organisation, e.g. github.com/prql/pq?

I started looking at github actions for building release artifacts and it just seems to me that we would probably want a different release schedule for pq to prql. prql-compiler is only used in one place in pq and could easily import the prql crate. While we would want to bump the pq version for each prql version bump, I doubt that we would want to do the converse.

snth avatar Oct 05 '22 20:10 snth

Do you want this to be in a prql-query subdirectory / workspace or should we maybe put this in a separate repo under the prql organisation, e.g. github.com/prql/pq?

That's totally fine — whatever you wish! I think fair point on the release cadence.

I think the only thing I would vote for is that we label it as something like "an experimental way to use PRQL to query data", and make it clear in the Readme that it's just one tool, and that PRQL is fundamentally a tool-agnostic language. I've been to org pages before and been confused on whether something is a tool / language / app / SaaS startup / way of life. (Similar to your comments on Discord today!). Hope that's reasonable.

max-sixty avatar Oct 06 '22 00:10 max-sixty

I think the only thing I would vote for is that we label it as something like "an experimental way to use PRQL to query data", and make it clear in the Readme that it's just one tool, and that PRQL is fundamentally a tool-agnostic language. I've been to org pages before and been confused on whether something is a tool / language / app / SaaS startup / way of life. (Similar to your comments on Discord today!). Hope that's reasonable.

Yes, exactly. Having it in a separate repo would also make that clearer.


Apart from the release cadence, another argument for separating the repos is to not have the pq issues clog the PRQL issue backlog. I anticipate that most the pq issues would be around problems with the integrations with the backends like DuckDB, DataFusion, ... . Where an issue is about PRQL language features, they can be redirected to the PRQL repo but you don't want to have your guys time taken up with integration issues.

snth avatar Oct 06 '22 07:10 snth

Super @snth , very much agree

max-sixty avatar Oct 06 '22 14:10 max-sixty

@max-sixty Are you happy for me to transfer ownership of the pq repo over to the prql org now and carry on development there?

Just thinking that since it's on the chat now, if anyone files an issue, it should probably go there.

snth avatar Oct 12 '22 20:10 snth

@max-sixty Are you happy for me to transfer ownership of the pq repo over to the prql org now and carry on development there?

For sure!

max-sixty avatar Oct 12 '22 21:10 max-sixty

Any concerns about the licensing before I transfer ownership?

Currently I've got Apache, MIT and UNLICENSE.

Not sure if there are any incompatibilities. I saw a lot of people dual-licensing with Apache and MIT and then I saw that ripgrep dual-licensed under MIT and UNLICENSE so I added that as well. 🤷‍♂️

snth avatar Oct 13 '22 13:10 snth

I would suggest to keep the amount of licensing to the minimum: a single permissive license, just as the main prql repo.

As far as I understand, having multiple licenses is the same as having any of them, which is the same as having only the most permissive one.

aljazerzen avatar Oct 13 '22 16:10 aljazerzen

Ok, I think the 0.0.5 version now has enough capabilities to make it worth publishing.

Support this far is for:

  • CSV
  • Parquet
  • JSON
  • DuckDB
  • PostgreSQL
  • (Sqlite) - it should work in principle but I find it doesn't work for anything returning numeric fields.

I'll transfer ownership tomorrow evening and bump version to 0.1.0 probably.

snth avatar Oct 13 '22 21:10 snth

I just tried to do the transfer but I got the following error:

You don’t have the permission to create public repositories on prql

@max-sixty I thought it would send you a request to approve but from my side it appears like it's done nothing. Not sure what the next step is here.


Btw, I also started looking into the crates.io publishing process and it turns out there is already a pq crate there pq - jq for protobuf. I've therefore renamed the repo to prql-query and I'll use that name for the crate. I've kept pq for the binary for now but we could rename that to prqlq (or prql) if need be.

snth avatar Oct 15 '22 15:10 snth

@max-sixty I thought it would send you a request to approve but from my side it appears like it's done nothing. Not sure what the next step is here.

I think you should now have permissions. Thanks for asking — as ever this is an oversight not an intentional restriction!

max-sixty avatar Oct 15 '22 18:10 max-sixty

Thank you that worked. prql-query is now owned by prql org! 🎉

Unfortunately now I'm no longer an admin on that repo so can't set up the branch protection and Discord webhook. @max-sixty can you please make me an admin on the prql-query repo? 🙈

snth avatar Oct 16 '22 07:10 snth

Great!!

Unfortunately now I'm no longer an admin on that repo so can't set up the branch protection and Discord webhook. @max-sixty can you please make me an admin on the prql-query repo? 🙈

Done!

max-sixty avatar Oct 16 '22 19:10 max-sixty

This is now live (https://crates.io/crates/prql-query) and was published by GH action in the repo so I'm closing this issue as Completed.

snth avatar Oct 17 '22 07:10 snth