prql
prql copied to clipboard
Implementing versioning
Following on from https://github.com/prql/prql/issues/367, we'd like to add versioning to PRQL, so that we can make changes to the PRQL language without breaking lots of existing code.
I'm not sure of the best way of doing this. Some initial thoughts:
- Should we write a new
implfor each version of the language? This could get heavy. - Should we write a transformer for each version — so we have an impl for (say) the current version of
4, and then a converter from1->2, and2->3,3->4? That would scale well in code complexity as we get more versions. - If we go with the transformer, do we do this as the text level? Parse tree? AST? The more abstract the representation we transform, the easier the transformation, but the higher the risk that we need to re-implement logic at the early stages — e.g. if we transform the AST, we'd need the same parser to work for both
1and5?
This can be delayed until 1.0.
Before that, we can afford to just raise an error when passing source that is not of the same version as the compiler.
I started writing this a couple days ago. https://github.com/PRQL/prql/issues/1123#issuecomment-1363917024 made me complete it.
And now for a dissenting viewpoint. :-) I would be reluctant to add versioning to the PRQL language. My mind reels as I read about transforms from v1 to v2. That's a lot of work, both to create a new language/transformer, then to test it, then to get the world to understand the change...
Historical precedent: The Python transition from 2.x to 3.x was a disaster. It generated huge amount of fear, uncertainty, and doubt. I still don't know which features go where. I don't write fussy code: I just type until the IDE doesn't give errors. Python survived despite the debacle because it had tens of millions of people who loved the language and had their programs woven into business-critical processes
PRQL won't have that luxury: I suspect it will always be a tool that delights a small but very devoted group of users. If it gets a reputation that "the new version broke everything" it will fade.
I'm a fan of the "No Breakage Club" that Dave Winer has advocated for years. Any new features must be completely backward compatible with old code. Here are a couple motivational articles:
A podcast about "breakage" Earliest "No Breakage" I could find
BUT... I'm not opposed to wild experimentation in these early days. Any language that says "for the intrepid" means what they say. That said, once we hit 1.0 - that's it. Any changes need to be completely backward compatible.
New thought: Will PRQL ever be "done"? Maybe... There are two ways to continue its evolution:
-
Language changes - reasons for further change to the language:
- Ways to express something useful that's not possible
- (maybe) Simpler way to express something that's very complicated.
- But here's where I say, "No Breakage"
-
Increasing integration with other tools:
- Other languages
- IDE
- Peripheral stuff that doesn't affect the language itself
- Adapting to new OS/rust/etc. But the underlying CI machinery already protects us against most of the hassle here
Thanks for listening.
</rant>
This is no rant, but a strong stance against breaking software, which I also hold dearly.
When we release 1.0, all valid PRQL source code at that point should remain valid in all future versions of PRQL.
We can however add breaking changes iff they are behind an opt-in: version: in the query definition. This is why the "old" compiler must not accept source with versions higher than it's version - so people cannot just stick version:10000 in their query and then complain when a future release breaks their queries.
Of course conditional language features are hard to implement and maintain, so I'm quite happy we live in a pre-1.0 world.
I'm not sure if we're agreeing or not... When I say, "no breakage", I mean that I want to come back to a project a month or even five years later, and paste a block of PRQL statements into the current compiler and have it produce SQL that implements what I mean.
It's perfectly fine if my crufty old code misses out on zoomy language features that have been recently implemented. I just don't want to "have to learn anything new" about the language to pick up with PRQL after a long time away.
This argues that no versioning is necessary. Old PRQL is always valid PRQL.
And let me reiterate: I am enthusiastically supportive of experimentation before we declare a 1.0 version. It's OK to break people's queries for 0.3 or 0.4 versions as we evolve our understanding of the best way to design this new language. Thanks
I think we are saying the same thing. The language we release as 1.0 must remain valid in the future, so your old queries will work with any new version of the compiler.
But we can add new features to the language like Rust does: have an explicit opt-in saying "using edition 2024" that has the new features that are incompatible with default 1.0 edition.
I think we are saying the same thing. The language we release as 1.0 must remain valid in the future, so your old queries will work with any new version of the compiler.
YES! Exactly my desire.
But we can add new features to the language like Rust does: have an explicit opt-in saying "using edition 2024" that has the new features that are incompatible with default 1.0 edition.
I'd have no qualms about encouraging people to include a comment like, # Requires PRQL 1.2 in their query.
But I don't see the need for automatic "versioning machinery" for PRQL. That is, I don't know how the PRQL compiler behavior could/should change based on the information. Either:
- it's compatible, and the query will "just work"
- it's an older compiler, and the compiler won't understand, and the user will see error messages.
Are there other possibilities? Thanks.
This is what rustc does (as far as I understand it):
- Each crate has an associated edition of Rust, which can be 1.0, 2015, 2018 or 2021.
- The default is 1.0, but it can be optionally in Cargo.toml.
- When creating new crates, cargo sets edition to the latest one.
- rustc then enables/disables certain features depending on the edition.
This allows your crate with a set edition to be able to compile with any future version of the compiler.
It also allows creating a new edition of with language features that would break existing queries. But because existing crates don't need to opt-in into the newer editions, they continue to compile.
But without this edition/language versioning system, we could never add anything that would break existing queries. With our current name resolution semantics, this would include even just adding a function to std lib.
But without this edition/language versioning system, we could never add anything that would break existing queries.
This exactly expresses the concern that I was voicing, and (what I view as) a bad outcome for the project.
It is my hope that existing PRQL queries will work forever - unchanged (of course, after we stabilize on Version 1.0) for these reasons:
- It's a lot of work to design the machinery to handle and test those various versions
- More importantly, it's a lot of cognitive load for our users to make a switch to some new facility (the "Python problem")
I'm not saying the language can never be added-to, just that existing features not change their syntax or semantics (the "no breakage" part). As we evolve toward a 1.0 language, we need to work hard to find all the peculiar use-cases to see if we have a reasonable way to handle them.
And finally, I need to state that this isn't my project. So I can only throw down this challenge to the team and see if it makes sense to the people doing all this good work. Thanks again for listening.
It is my hope that existing PRQL queries will work forever
This is the goal of the versioning system! From the book: https://prql-lang.org/book/queries/dialect_and_version.html
The compiler will compile for the major version of the query. This allows the language to evolve without breaking existing queries, or forcing multiple installations of the compiler. This isn’t yet implemented, but is a gating feature for PRQL 1.0.
It's similar to rust editions. So if we give a version in the query (prql version:1), then future compilers can compile based on that version. And we can still make breaking changes to the language.
Clearly it would be better to have no breaking changes ever, and to not need a way of specifying versions. But in lieu of that, this offers backward compatibility. @richb-hanover I think you gave an example in a previous comment of a project that had over-emphasized backward-compat.
Another advantage of implementing this — we could release experimental features without worrying about backward compatibility or doc warnings.
For example, the we're still figuring out exactly switch / match should work in #1286. It would be cool if we could release that but require version:nightly or similar. This is similar to rust's approach.
(Though at the moment, given the number of users, this is "cool" rather than "essential", hopefully that will change!)
It is my hope that existing PRQL queries will work forever
This is the goal of the versioning system! ...
I mean this with the deepest respect, but I still think we're talking past each other. Here's what I'm thinking:
Until we declare a 1.0 version, I don't care about changes to the language. In fact, we should experiment, and horse around with the various keywords/syntax: maybe try out switch in 0.4.0, match in 0.5.0, case in 0.6.0... But once we hit the 1.0 version, all keywords are frozen (and we're stuck with them.) PRQL will always create something sensible if presented with those keywords/syntax/statements. If we subsequently have a blinding realization and come up with the perfect syntax, we must find a different keyword ("choisissez"? :-) for the new construct.
At no time does this require someone to know a "version". They just type (post-1.0) PRQL statements and everything's fine. They might miss out on the new choisissez feature, but they'll still get functioning SQL
As I read the statement "... The compiler will compile for the major version of the query..." my fear is that this rule would permit a 2.0 version to display an error (or produce the incorrect SQL) if presented with an original PRQL 1.0 query.
Can you allay my fears? (Maybe this is a good item for the Dev call.) Thanks
Follow-on thoughts:
- For the same reasons, I would counsel against a
version:nightlyfeature. During the pre-1.0 time period, we could experiment with case/switch/match by implementing all, but with a firm commitment to choose only one, and add deprecation warnings for the others as we draw closer to 1.0. (This substitutes human judgement during development for a bunch of code that lives forever and hardly ever gets exercised.) - Displaying the compiler version in a comment (as we currently do) is entirely sensible. It helps with debugging, but it's only an artifact of the compiler, not a "language feature"
As I read the statement "... The compiler will compile for the major version of the query..." my fear is that this rule would permit a 2.0 version to display an error (or produce the incorrect SQL) if presented with an original PRQL 1.0 query.
Can you allay my fears?
A query written for a 1.0 compiler with a version:1.0 would still compile with a 2.0 compiler. If it didn't specify version:1.0, then it would break, since the default would be version:2.0 at that point.
Check out Rust Editions — it's largely the same approach.
Does the proposal make sense (even if it doesn't allay your fears :) )?
Or even default to version 1.0 if the query does not specify it. This way no queries will break when switching to newer versions of the compiler.
I'm going to come to the dev call this weekend, but I want to offer this thought:
You have sort-of allayed my fears - I am comforted that you would ensure any future version of the compiler work with PRQL 1.0 queries. I looked at the Rust Editions link: I understand how a well-established language needed an escape hatch, for example, to treat "async" and "await" as keywords, not variables...
But before you make the final decision for PRQL, I am going to challenge you to envision two or three potential enhancements to the language that would force a non-backward-compatible syntax. What would cause "version 1.0" code not to work unchanged? (Maybe multi-dimensional queries or grouping sets or "mutation queries" or any of the other "language" feature requests...)
And is there no way to fit them into the (pre-1.0) current language? See you on Sunday. Thanks.
Great, agree with your points re attempting to anticipate future breaking changes.
Look forward to you joining.
(to set expectations — we're probably not going to discuss this specific issue on the call given it's not in critical path and it'll be a busy one, but happy to do what the consensus wants...)
(to set expectations — we're probably not going to discuss this specific issue on the call given it's not in critical path and it'll be a busy one, but happy to do what the consensus wants...)
I'd like to reserve 60 or 90 seconds on the dev call to explain why I care so much about this. Then I would be content to push off further discussion until we get closer to needing a final decision. Thanks.