serde icon indicating copy to clipboard operation
serde copied to clipboard

Revert the suggestion to use the `derive` feature instead of `serde_derive` directly

Open CryZe opened this issue 2 years ago • 21 comments

Here it was decided that serde_derive should not be imported directly and instead the derive feature should be used. This however is directly responsible to the incredibly bad compile times that the new prebuilt binary is meant to address. Here's why:

Cargo itself is able to heavily parallelize compilation, however serde with the derive feature active, causes a huge non-parallelizable dependency chain:

proc-macro2 compile build script > proc-macro2 run build script > proc-macro2 > quote > syn > serde_derive > serde > serde_json (or any crate that depends on serde)

This means that while cargo can compile lots of crates in parallel in the beginning, all the crates depending on serde can only start compiling after serde is done compiling, which is very late. So there might be a reasonably long time where cargo isn't able to compile much in parallel at all, resulting in few cores being utilized.

This can be seen here:

https://i.imgur.com/Gs8gKDV.png

However, you can easily break this dependency chain, by not activating the derive and instead depending on serde_derive yourself. This allows serde and its dependents like serde_json and a made up needs-serde in my example to compile from the beginning of the compilation, completely in parallel to all the derive related crates.

https://i.imgur.com/z3ZG7gZ.png

This cuts compile times from 4.7s to 2.6s (and honestly I don't even know where the dubious claim of 9s even came from, the serde with derive feature release build takes 3.8s in this scenario (excluding everything that comes after)).

So this basically gives you the same, if not better compile time performance improvements than the prebuilt binary in almost every case:

  1. It works on every host.
  2. serde has no dependency on any derive crate at all, allowing it to start compiling even earlier than with a prebuilt binary.
  3. syn is still a part of the dependency chain, which realistically is going to be the case for most real world projects anyway, as other derive crates will need it to be built anyway, so the prebuilt binary would only help in mostly toy projects anyway.
  4. The derive dependency chain is now a lot shorter, meaning the bottleneck caused by it is likely to be only perceived on CPUs with over 40 cores or so now anyway.

CryZe avatar Aug 20 '23 10:08 CryZe

I've planned this for a long time but it's stuck on https://github.com/rust-lang/rust/issues/54363#issuecomment-1237544924. By depending separately on serde and serde_derive you no longer get this version equality constraint, which causes mayhem: for example Dependabot will try to upgrade them independently. One version of serde_derive is specific to exactly one version of serde, because it references private APIs that serde exposes specifically for that version of the derive.

With first-class macro dependencies, we'd want to add:

[package]
name = "serde_derive"
version = "1.0.Z"

[lib]
proc-macro = true

[dependencies]
# ...

[macro-dependencies]
serde = "=1.0.Z"

which correctly encodes the version equality constraint.

dtolnay avatar Aug 20 '23 10:08 dtolnay

Dependabot is actually able to correctly upgrade crates in pairs (or more) for other languages, such as JavaScript. Not sure if they require some sort of special support from the package manager like you suggested or they can detect it automatically. It might be a reasonably easily PR to Dependabot to support that. I'll try to look further into it.

But yeah having a way to specify crates that need to be in lock step would be really good too for other tooling. wasmtime for example has various supporting crates (for WASI and co.) that also always get bumped in lock step, so having a way to specify that would be great too.

CryZe avatar Aug 20 '23 10:08 CryZe

why not introduce a new crate serde_lockstep that manages the same version for both serde and serde_derive and encourage user's to use the following:

[dependencies]
serde_lockstep = "1"
serde = "1"
serde_derive = "1"

soqb avatar Aug 20 '23 18:08 soqb

One version of serde_derive is specific to exactly one version of serde, because it references private APIs that serde exposes specifically for that version of the derive.

This should be mentioned on the serde_derive docs then. I know at least one crate that is using it directly, and looking at the docs there is no mention that this is an unsupported setup.

Nemo157 avatar Aug 20 '23 20:08 Nemo157

why not introduce a new crate serde_lockstep that manages the same version for both serde and serde_derive and encourage user's to use the following:

[dependencies]
serde_lockstep = "1"
serde = "1"
serde_derive = "1"

i suppose this could be further improved by migrating the meat and potatoes of serde into a common serde_inner (bikeshed) crate and having the serde crate wrap both.

soqb avatar Aug 20 '23 21:08 soqb

I think I have a horrible idea here! In serde's Cargo.toml we could add

[package]
name = "serde"
version = "1.0.101"

[target.'cfg(never)'.dependencies]
serde-derive = { version = "=1.0.101", path = "./serde-derive" }

If I understand Cargo's innner workings correctly, this "associated macro" pattern would constrain Cargo to always pick serde and serde derive in lockstep, without putting serde-derive in front of serde in compilation graph. That is, serde derive would still be present in lockfile, with a correct version, but it won't be otherwise downloaded or built.

Light testing with https://github.com/matklad/macro-dep-test seems to work!

matklad avatar Aug 20 '23 21:08 matklad

Light testing with https://github.com/matklad/macro-dep-test seems to work!

Did you test from outside the workspace? I've found that deactivated dependencies version requirements (using optional, I haven't tested cfg) influence version selection inside the workspace, but not when the crate is used as a dependency or via cargo install.

Nemo157 avatar Aug 20 '23 21:08 Nemo157

I am quite interested in the macro-dependencies idea. Is https://github.com/rust-lang/rust/issues/54363#issuecomment-1237544924 the full story in your mind, or could you share a generalized semantic meaning of macro-dependencies?

(sorry if that is off-topic 😆)

weihanglo avatar Aug 20 '23 21:08 weihanglo

@Nemo157 seems to work as (perhaps not) intended?

λ bat Cargo.toml 
[package]
name = "foo"
version = "0.1.0"
edition = "2021"

[dependencies]
macro-dep-test = { version = "=0.1.2" }
macro-dep-test-macros = { version = "=0.1.1"  }

22:39:19|~/tmp/foo|master⚡?
λ cargo build
    Updating crates.io index
error: failed to select a version for `macro-dep-test-macros`.
    ... required by package `macro-dep-test v0.1.2`
    ... which satisfies dependency `macro-dep-test = "=0.1.2"` of package `foo v0.1.0 (/home/matklad/tmp/foo)`
versions that meet the requirements `=0.1.2` are: 0.1.2

all possible versions conflict with previously selected packages.

I've found that deactivated dependencies version requirements (using optional, I haven't tested cfg)

That's the thing! optional dependencies are not present in the lockfile --- Cargo knows what features exist in the root package being built, so it can correct infer that some transitive feature in a dependency can not be activated, and so this dep can be pruned from Cargo.lock.

With cfg-guarded deps though, Cargo has to conservatively assume that they could be present. So they are always present in lockfile, and consequently constraint resolver. (that's why Linux-only projects always include like half-a-dozen of windows-specific crates in their lockfile)

matklad avatar Aug 20 '23 21:08 matklad

perhaps we could change it to cfg(all(not(x), x)) which (i think) makes it impossible to accidentally enable.

soqb avatar Aug 20 '23 21:08 soqb

Pushed an explanation to https://github.com/matklad/macro-dep-test/blob/master/README.md

matklad avatar Aug 20 '23 22:08 matklad

Thanks @matkad (https://github.com/serde-rs/serde/issues/2584#issuecomment-1685400047) and @soqb (https://github.com/serde-rs/serde/issues/2584#issuecomment-1685405115), that looks great. I didn't think of that. I would be open to shipping that solution in serde.

dtolnay avatar Aug 20 '23 22:08 dtolnay

im working on a PR now, trying to get something for CI to guarantee it!

soqb avatar Aug 20 '23 22:08 soqb

Alternatively (or in addition), pull out a serde_core for serde_json and others to depend on directly and have serde re-export it and serde_derive using = operators.

  • Parallel builds for serde and the derive machinery
  • No accidental enabling of derive in a way that slows down everything
  • serde_derive and serde (but not serde_core) can break compatibility
  • Minimal retrofit costs to the ecosystem (only packages that don't use derives)

I've done something similar in clap and its worked out well.

epage avatar Aug 21 '23 00:08 epage

^^ that is what time will be doing once there's an improvement to rustdoc

jhpratt avatar Aug 21 '23 00:08 jhpratt

One possibly big problem with suggesting users to go back to depending on serde and serde_derive separately is that it could re-introduce a problem I saw when serde's derive feature was still pretty new and not everybody was using it: libraries that depended separately on them and didn't activate the derive feature on serde would sometimes import both serde::Serialize and serde_derive::Serialize (or Deserialize from both) in one module, which worked fine at first because importing a macro and type/trait of the same name is not a conflict; however once that library was compiled as part of a larger project where some other thing enabled serde's derive feature, rustc would complain that the two imports conflict as both would now refer to a macro (and one also to the trait).

Maybe this is solved nowadays, I guess rustc should be able to tell that the macro-namespace serde::Serialize is exactly the same as serde_derive::Serialize, so it could allow importing from both locations redundantly. I should be able to test that later, if nobody beats me to it.

If this problem still persists, I think that's a good reason to favor the solution suggested by @epage (I personally prefer if anyways because of convenience and users not having to do anything for the compile-time gains).

jplatte avatar Aug 21 '23 04:08 jplatte

To @jplatte’s point here is the original issue that mentiones that problem: #1441

mitsuhiko avatar Aug 21 '23 05:08 mitsuhiko

I wonder if it would be possible to at least detect faulty crate dependency resolutions by having serde export a public constant like const pub SERDE_VERSION: (i32, i32, i32) = (1, 0, 175); and having the derive macro emit code like

const pub SERDE_VERSION_CHECK: () = {
  if SERDE_VERSION.0 != 1 || SERDE_VERSION.1 != 0 || SERDE_VERSION.2 != 175 {
    panic!("serde_derive and serde version mismatch detected");
  }
};

Then at least users would be left less confused about what to do when the versions get out of sync, and it would be guaranteed to fail immediately rather than only if there happened to be an actual conflict.

RalfJung avatar Aug 21 '23 06:08 RalfJung

Maybe this is solved nowadays, I guess rustc should be able to tell that the macro-namespace serde::Serialize is exactly the same as serde_derive::Serialize, so it could allow importing from both locations redundantly. I should be able to rest that later, if nobody beats me to it.

No, this is still not solved. A workaround is to use #[derive(serde_derive::Serialize, serde_derive::Deserialize)] rather than importing them.

Nemo157 avatar Aug 21 '23 06:08 Nemo157

Maybe this is solved nowadays, I guess rustc should be able to tell that the macro-namespace serde::Serialize is exactly the same as serde_derive::Serialize, so it could allow importing from both locations redundantly. I should be able to rest that later, if nobody beats me to it.

No, this is still not solved. A workaround is to use #[derive(serde_derive::Serialize, serde_derive::Deserialize)] rather than importing them.

It's also possible to always import the traits from serde::de::Deserialize and serde::ser::Serialize in modules where the derive traits are also refered to. The problem I see with both is that they're annoying to enforce. I guess you could have serde with the derive feature as a dev-dependency so cargo check ensures your stuff compiles w/o the derive feature, and cargo test / cargo check --tests ensures that it compiles with the feature. I think that would also mean you'll constantly be recompiling serde when switching between the two build configurations though.

jplatte avatar Sep 05 '23 13:09 jplatte