sqlx
sqlx copied to clipboard
WIP: [offline] implement one-file-per-query
continuation of/supercedes #1183 closes #570 closes #1005
cc @jplatte
TODO:
- [x] choose a binary format
- We want the most efficient deserialization since it'll be compiled in debug mode.
- Less churn in Git diffs if files are not textual (maybe cover this with
.gitattributes
?)
- [ ] update documentation
- [x] tests?
Title is not accurate (and wasn't on my PR for a while 😅), since the prepare command is still there (and I think that is the right thing).
Also, since this touches the whole cargo-metadata thingy, it may be a good idea to fix #1706 along the way (see latest comment for how it can likely be solved).
The title was automatically generated from your commit message, which I cherry-picked to keep correct attribution on the parts of your code that were useful.
Re: the performance of the query macros, I've been testing this branch along with a couple that switch the serialization from serde_json
to bincode
and ciborium
(CBOR) and the performance is near identical on a project with ~100 query macro invocations.
Adding the following to the Cargo.toml
at the workspace root improves performance by roughly 10-20%:
[profile.dev.build-override]
opt-level = 3
debug = false
debug-assertions = false
incremental = false
overflow-checks = false
The caching implemented by @LovecraftianHorror in #1684 was actually the biggest performance win for offline mode as it cut the runtime of the macros by roughly a factor of 2-4 over v0.5.10, and is comparable in performance to this branch and its siblings.
SQLx 0.5.11 with the caching of sqlx-data.json
and optimizations enabled as above is actually faster than any of my branches by about 20%.
However, the total runtime of the macro expansion pass as given by -Z time-passes
is still dwarfed by the time spent in codegen and LLVM, which doesn't change. The project takes between 11-13 seconds to compile in debug mode across all configurations.
While we probably still want this change for its ergonomic benefits, efforts in optimizing the compile time of SQLx are probably better spent in optimizing the code we generate and not necessarily how we generate it.
Here's a flamegraph of the compilation with v0.5.11 without optimizations, code in libsqlx_macros
counts for only 1.62% of the total time:
:wave: @abonander Do you still intend to revisit this PR when you have time or would you be open to someone taking over? I see this is quite a long-standing issue.
There seem to be only a few tasks outstanding:
- Address TODO in prepare.rs to implement
--check
. - Improve support for workspace cases (currently --workspace does not generate queries for sub-crates when the workspace root is also a crate, and queries are not generated for some sub-crates when in a virtual workspace, not sure why yet).
- Fix an empty .sqlx folder being generated when run in a sub-crate in a workspace, causing an error that the root .sqlx folder is missing (if it doesn't happen to exist already). Is it intended that all queries in a workspace should be stored in the root .sqlx?
- Fix merge conflicts with
main
(replacecargo.rs
withmetadata.rs
etc.). - Update documentation:
- https://github.com/launchbadge/sqlx/blob/main/sqlx-cli/README.md#enable-building-in-offline-mode-with-query
- https://docs.rs/sqlx/latest/sqlx/macro.query.html#offline-mode-requires-the-offline-feature
- Update comments, doc-comments and error messages mentioning offline mode/
sqlx-data.json
.
I expect merging main
will probably fix one or two issues regarding workspaces.
Or am I underestimating the work left, is there something I missed?
Regarding .sqlx
at the workspace level vs .sqlx
at the package level, I think the latter makes more sense as that makes it possible to publish crates containing sqlx queries with the metadata cache included.
Well, it took longer than expected but I continued the branch in my fork. It's cleaned up and pretty much finished apart from documentation. If there's any interest, you can view the changes here (didn't want to open a PR without permission): https://github.com/launchbadge/sqlx/compare/main...cycraig:sqlx:feat/one-file-per-query
I kept pretty close to the current PR (e.g. kept the offline in-memory caching implementation rather than adapting the version of it from main
, both would work fine but not sure if there's a performance difference yet).
What did I change then? Pretty much what was listed above and some more:
- Fixed merge conflicts with
main
. - Implemented
sqlx prepare --check
which was a TODO. - Replaced
CargoMetadata
with theMetadata
frommain
. - Fixed several bugs:
-
prepare
in a sub-package trying to use the workspace.sqlx
folder and failing, fixed with the newSQLX_OFFLINE_DIR
variable (which was the intention I believe). It now outputs.sqlx
at the package level unless--workspace
is used. - IO/OS race conditions with the new file renaming/moving strategy when saving offline query data to disk.
-
CARGO_TARGET_DIR
problems when compiling a sub-package of a workspace with multiple targets (weird).
-
I manually tested cargo sqlx prepare
and cargo sqlx prepare --check
in a project with a:
- Single crate.
- Virtual workspace with three crates.
- Workspace with three crates and a top-level crate.
Which confirmed the behaviour should be the same as before in terms of which queries are included or not compared to sqlx-data.json
.
Take a look if you have the chance and let me know if it's worth finishing this off (just documentation and testing/benchmarking, unless you see anything that needs changing, like the in-memory caching) or pursuing a different direction, such as starting from scratch?
I'm working on a large refactor for 0.7.0 and we want to land this as part of that release. @cycraig if you'd open a new PR based against 0.7-dev
I'll be glad to look at it.
Completed in #2363