Pkg.jl Proposal for first class support of conditional dependencies in Pkg

Proposal for first class support of conditional dependencies in Pkg

Open KristofferC opened this issue 4 years ago • 47 comments

This is a proposal for adding first class support in Pkg and the code loading system for conditional dependencies.

What is a conditional dependency

Desribing a conditional dependency is easiest with an example. A typical concrete example is for a plotting package to add support for plotting e.g. DataFrames (by adding some method plot(::DataFrame)) but not require a user to install DataFrames to use the plotting package. The plotting package wants to run a bit of extra code (the part that defines the method) when the conditional dependency DataFrames are somehow "available" to the user. The extra code that the package executes when the conditional dependency is active is called "glue code".

Current way of doing conditional dependencies

The way people implement conditional dependencies right now is by using Requires.jl. It works by registering a callback that evaluates some code with the package loading code in Base. The callback gets executed when the conditional dependency is loaded (by e.g. comparing UUID), the code from the callback is evaluated into the module and the functionality for the conditional dependency is provided.

As an example usage:

using Requries
function __init__()
    @require DataFrames="c91e804a-d5a3-530f-b6f0-dfbca275c004" plot(df::DataFrame) = ---
end

What is the problem with Requires.jl

There are a few reasons why the current strategy using Requires.jl to deal with this is unsatisfactory.

It doesn't work well with precompilation. The way people tend to use Requires is by include-ing some file when the conditional dependnecy is available. Requires.jl runs inside __init__ which means the code evaluated by the include command does not end up in the precompile file.
It is "implicit" in the sense that the conditional dependency is only defined in the Julia code. We typically want to put all dependency information inside the Project file.
There is currently no good way to set compat bounds on the conditional dependency.
It has performance problems (https://github.com/MikeInnes/Requires.jl/issues/39)
Basing the activation criteria of the conditional dependency on it simply being loaded might means that packages loaded from other places than the current project will affect whether the glue code is run or not. It would be better to base it only on the current active project.

The current proposal

How does declare a conditional dependency

One declares a conditional dependency by adding an entry to the Project.toml file as:

[conditional-deps]
DataFrames = "$UUID_DATAFRAMES"

[compat]
DataFrames

An alternative possibility is to just put DataFrames inside [deps] and then have a list of names that are conditional.

[deps]
SomeOtherDep = "..."
DataFrames = "$UUID_DATAFRAMES"

[conditional-deps] = ["DataFrames", ]

Where should the glue code be stored?

Precompilation works on a module granularity so we want a module containing the glue code for each conditional dependency. The gluecode would be stored (based on a documented convention) in a file inside the package, eg src/DataFramesGlue.jl inside Plots where the exact name of the file is yet to be decieded.

An example of a glue file for Plots conditionally depending on DataFrames is:

module DataFramesGlue

using Plots, DataFrames
Plots.plot(df::DataFrame) = ...

end

How is the glue code loaded?

When DataFrames gets loaded, we check all packages that declares a conditional dependency with it. If the version of DataFrames loaded is compatible with the compat entry for a package with DataFrames as a conditional dependency, we load the glue code which will act like a normal package and precompile. We need to teach code loading some stuff about glue packages so it knows how to map the names inside the glue module to the UUIDs in the "main package".

The fact that we are not trying to resolve a set of versions compatible with the conditional dependency avoids cases where we in general need to resolve in arbitrarily many times with potential of cycles.

Aug 03 '19 22:08 KristofferC

Thank you very much for writing this up.

I have a use case in LogDensityProblems.jl which I am wondering about. Specifically, both the glue code for working with ForwardDiff and ReverseDiff relies on DiffResults to extract gradients.

Currently this is handled by a code that looks like


function __init__()
    @require DiffResults="163ba53b-c6d8-5494-b064-1a9d43ac40c5" include("DiffResults_helpers.jl")
    @require ForwardDiff="f6369f11-7733-5829-9624-2563aa707210" include("AD_ForwardDiff.jl")
    @require ReverseDiff="37e2e3b7-166d-5795-8a7a-e32c996b4267" include("AD_ReverseDiff.jl")
end

so, for the purposes of Requires.@require, DiffResults is considered available because if the user is using ForwardDiff then it loaded DiffResults so it triggered the shared glue code.

Would this continue to work? For the mechanism you propose, I imagine I could just provide deps information for DiffResults.

Generally, how is it handled when glue code needs other modules which themselves would not trigger glue code of their own? Can we still specify eg compat bounds for them?

Aug 04 '19 05:08 tpapp

If I understand your example correctly, you would just have to declare a conditional dep on DiffResults (and ForwardDiff + ReverseDiff).

Aug 04 '19 13:08 KristofferC

Thanks. So, if I do that, then eg it would be triggered by ForwardDiff loading DiffResults, and the latter would not have to be explicitly loaded by the user? That's the way it works now with Requires.

Aug 04 '19 13:08 tpapp

Yes.

Aug 04 '19 14:08 KristofferC

What if I want to define a glue module to be loaded when both CuArrays and OrdinaryDiffEq are imported? That is to say, can there be something equivalent to the following?

function __init__()
    @require CuArrays="..." begin
        @require OrdinaryDiffEq="..." include("glue.jl")
    end
end

I guess a possible API would be to include (say) [on-import] section in Project.toml to bundle conditional-deps explicitly

[conditional-deps]
CuArrays = "..."
OrdinaryDiffEq = "..."

[compat]
CuArrays = "..."
OrdinaryDiffEq = "..."

[on-import]
foo = ["CuArrays", "OrdinaryDiffEq"]

which tells the loader to include src/on-import/foo.jl when CuArrays and OrdinaryDiffEq are loaded.

Aug 08 '19 17:08 tkf

This sounds fantastic. Is this a feature that would be available in a Julia 1.x release, e.g. Julia 1.4 or Julia 1.5? Or would it have to wait until Julia 2.0?

Aug 08 '19 17:08 DilumAluthge

One declares a conditional dependency by adding an entry to the Project.toml file as:

[conditional-deps] DataFrames = "$UUID_DATAFRAMES"

[compat] DataFrames

An alternative possibility is to just put DataFrames inside [deps] and then have a list of names that are conditional.

[deps] SomeOtherDep = "..." DataFrames = "$UUID_DATAFRAMES"

[conditional-deps] = ["DataFrames", ]

I like the first one more. Listing the conditional dependencies under [deps] might get a little confusing.

Aug 08 '19 18:08 DilumAluthge

This sounds fantastic. Is this a feature that would be available in a Julia 1.x release, e.g. Julia 1.4 or Julia 1.5? Or would it have to wait until Julia 2.0?

Some Julia 1.x.

What if I want to define a glue module to be loaded when both CuArrays and OrdinaryDiffEq are imported?

Yeah, I thought about this a little bit too. A first implementation of this might not support this but we should probably make sure that adding it will not be awkward.

Aug 15 '19 14:08 KristofferC

Triaging to discuss what to do about multiple conditional dependencies (which kind of starts to sound like "features").

Aug 16 '19 14:08 KristofferC

Multiple conditional dependencies (2, 3, or even more than 3 packages required for the glue code to be loaded) is definitely a use case for me!

Looking forward to seeing what triage thinks!

On Fri, Aug 16, 2019 at 10:53 Kristoffer Carlsson [email protected] wrote:

Triaging to discuss what to do about multiple conditional dependencies (which kind of starts to sound like "features").

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JuliaLang/Pkg.jl/issues/1285?email_source=notifications&email_token=ABK4BLJHIYIGLM2FZQR3MVDQE25OJA5CNFSM4IJEA2D2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4O2SXY#issuecomment-522037599, or mute the thread https://github.com/notifications/unsubscribe-auth/ABK4BLI2CGSXV24O7MRRMLDQE25OJANCNFSM4IJEA2DQ .

Aug 16 '19 15:08 DilumAluthge

Regarding to multiple deps, maybe we could borrow something similar from rust-cargo (as features), it was included in this proposal: #977

Aug 22 '19 19:08 Roger-luo

AFAIU, the difference between features and conditional dependencies is that a feature is something that someone opts into from the current active Project while a conditional dependency is automatically "activated" whenever it's requirements are satisfied.

Aug 22 '19 19:08 KristofferC

I'm going to make an alternate proposal here. First: I think we should not call these conditional dependencies. They're NOT dependencies—they are packages that glue other packages together and are loaded automatically when the set of packages that they glue together are loaded. They depend on the packages that they glue together, not the other way around! This is crucial. So instead, I propose that we call them "glue packages". Here's how we could specify them in a package's P's Project.toml file:

name = "P"
uuid = "<uuid>"

[deps]
# P's dependencies here

# glue with a single dependency
[glue]
A = "<uuid>" # source at `glue/A.jl`
B = "<uuid>" # source at `glue/B.jl`

# glue with multiple dependencies
[glue.CD] # source at `glue/CD.jl`
C = "<uuid>"
D = "<uuid>"

There's a few ways that we can go with the glue/{A,B,CD}.jl files. One way is to treat them like normal packages that have to define a module of the appropriate name. This is a little weird, though because the file A.jl glues together P and A so it shouldn't define a module named A it should define a module named A_P or something like that. Also, the module's name doesn't matter at all: no one ever loads it by name. The only reason it needs to exist is so that we can save it in .ji files. So maybe these should be more implicit, as if this is done for you:

module P_A
    import P, A
    include("glue/A.jl")
end

Then inside of the file glue/A.jl all you have to do is define all the functionality needed to glue P and A together. Similarly for a multi-dependency glue file like CD.jl it would be implicitly loaded like this:

module P_CD
    import P, C, D
    include("glue/CD.jl")
end

Now, the actual loading would work like this: when all of the packages P, C and D have been loaded—however that happens—then the glue package P_CD is also loaded. As I mentioned before, since it is a package that depends on P, C and D it gets its own .ji file which can be reloaded whenever another Julia process using the same versions of these three modules runs.

Aug 28 '19 19:08 StefanKarpinski

They're NOT dependencies—they are packages that glue other packages together and are loaded automatically when the set of packages that they glue together are loaded.

I agree. I'm using on-import section in my earlier comment to emphasize that it is more like a "hook." So "import hook" may be an alternative terminology (not that "glue package" is bad).

I think one benefit of formalizing it as hooks is that code loading can respond to other "conditions" like feature flags like @Roger-luo is suggesting. Project.toml file can look like something like:

[extras]
CuArrays = "..."
Zygote = "..."

[hook.GPUImpl]
import = ["CuArrays", "Zygote"]
feature = ["GPU"]

which indicates that hook/GPUImpl.jl will be (precompiled and) loaded when CuArrays and Zygote are imported and feature flag GPU is specified.

I'm not suggesting adding feature flag support right now. I just thought this format is more extensible.

Aug 29 '19 00:08 tkf

I very intentionally don't want this mechanism to be too flexible. I don't want anything besides loading a glue package to happen when a set of packages are loaded. Of course, that's not very restrictive since loading a package can execute arbitrary code, but it does mean that there's a module that results which can be precompiled and saved—and that not being the case is precisely what's so problematic with the current requires system. Having arbitrary import hooks are likely to have all the problems that requires currently has.

I also very much do not want this to be a mechanism for changing the behavior of packages. The only liberty that a glue package should take is that it can define methods (and types, I guess) for that depend on the types the packages that it glues together. So, it would be considered type piracy for a normal package A that depends on B and C to define B.f(::C.T) but if it's a glue package gluing B and C together, then it's perfectly kosher to do that.

I don't know how we should handle package features like what Roger wants, but it must not be this, or we will completely screw up the ability of this feature to fix the current precompilation issues.

Aug 29 '19 17:08 StefanKarpinski

It was not my intention to suggest introducing any events for the hooks more dynamic than code loading events. If you think the term "hook" suggest features more dynamic than what is already possible by what you are suggesting, it probably is not the best term to use.

But (temporal) dynamism and flexibility are different and feature flags can be implemented in very static manner. For example, if MyPackage needs a set of feature flags, Pkg can create (say) ~/.julia/options/$manifest_slag/MyPackage.toml for each environment where manifest_slag is the hash of the full path of Manifest.toml. This option file can then be tracked as a dependency of the .ji files (using include_dependency) of the glue modules.

Aug 31 '19 01:08 tkf

Actually, let me take back my earlier comment. Feature flags can be turned into consts of MyPackage and then can be checked inside the glue modules. This approach would waste precompile cache files (i.e., creates a no-op .ji files when certain glue is not needed when certain flags are not set) but it's probably better to orthogonalize glue modules and feature flags concepts.

Aug 31 '19 01:08 tkf

We need to be able to give compat info. How about making the glue packages look a lot like a "mini package" but each glue package is under a glue header:

[[glue]]
[glue.deps]
A = "<uuid>" # source at `glue/A.jl`

# Adding compat and file
[[glue]]
file = "glue/B_flue"
[glue.deps]
B = "<uuid>"
[glue.compat]
B = "0.4"

# Multiple
[[glue]]
file = "my_glue_C_D.jl" # source at `glue/my_glue_C_D.jl`
[glue.deps]
C = "<uuid>"
D = "<uuid>"
[glue.compat]
C = "0.2"
D = "0.1"

It's pretty ugly with all the [glue.] though.

Sep 08 '19 17:09 KristofferC

Just allow anything that appears in a glue stanza in the normal [compat] section?

Sep 10 '19 18:09 StefanKarpinski

To elaborate, I think it would be confusing to use clashing names across glue packages so I think it's sane for them to have to match and it doesn't make sense for compat bounds not to match across glue packages, so we can just put glue bounds in [compat] with the name used in the [glue] stanzas.

Sep 10 '19 18:09 StefanKarpinski

How would testing "glued packages" look? Could there be something similar for a test/Project.toml and files like test/glue/A.jl etc.?

Sep 27 '19 13:09 lkapelevich

How is this proposal different from https://github.com/JuliaLang/Pkg.jl/issues/1251? It might be more scalable to have each set of glue be their own full-fledged package? I guess this is attempting to be a lighter weight alternative? Should we have a naming convention at the package level? The main problem I'm trying to solve is that the main package plus its glue should be in a single repository. Those who are trying to do installations may also wish for express control which glue packages/modules get installed?

Oct 15 '19 13:10 clarkevans

#1251 is only about where to retrieve package sources, otherwise they're completely independent packages. It doesn't seem to make sense to resolve glue packages independently, so making them fully independent registered packages seems like overkill.

Oct 15 '19 15:10 StefanKarpinski

To summarize what I'm currently proposing:

name = "P"
uuid = "<uuid>"

[deps]
# P's dependencies here

# glue with a single dependency
[glue]
A = "<uuid>" # source at `glue/A.jl`
B = "<uuid>" # source at `glue/B.jl`

# glue with multiple dependencies
[glue.CD] # source at `glue/CD.jl`
C = "<uuid>"
D = "<uuid>"

[compat]
# deps compat here, but also glue compat:
A = "1.2"
B = "~2.3"
C = "0.5.3"
D = "2"

This means glue names are in the same project namespace as [deps] and [extras] and can therefore be referenced and constrained via [compat]. The glue code goes into files like this:

glue/A.jl
glue/B.jl
glue/CD.jl

Those files DO NOT have to declare modules or imports, they are provided automatically, so the contents of the files are just like this for glue/CD.jl:

# module P_CD
# import P, C, D
P.f(c::C.type, d::D.type) = ...
# end

You do not write the module P_CD part or the import P, C, D—those are done implicitly. The actual name of the module may not be P_CD since it should not be referred to or imported directly.

Oct 21 '19 19:10 StefanKarpinski

@StefanKarpinski: just a question about the syntax in your proposal: for multiple dependencies, should we understand

[glue.ABC...]
A = "<uuid>"
B = "<uuid>"
C = "<uuid>"
...

literally, ie the 26 capital letters [A-Z] are the valid placeholders? Wouldn't one run out of combinations quite rapidly?

Perhaps

[glue.arbitrary_key] # with source at glue/arbitrary_key.jl
SomePackage = "<uuid>"
SomeOtherPackage = "<uuid>"

for eacg glue bit would be a bit more general.

I agree that version bounds can go in [compat] together with the non-glue packages, as allowing for glue-specific combinations of bounds should not be necessary.

Nov 16 '19 06:11 tpapp

The letters are stand ins for actual package names. The name CD is an arbitrary key, but would presumably often be a concatenation of the names of the packages involved, though not necessarily.

Nov 27 '19 13:11 StefanKarpinski

@StefanKarpinski @KristofferC Thank you so much for working on conditional dependencies, I think for usability and not polluting the namespace, it's a great idea.

How would implementation dependencies of glue be handled? Let's say a user would like to use DataKnots to connect to a PostgreSQL data source (our glue code is currently in its own unregistered package, DataKnots4Postgres). This glue code has an additional dependency upon PostgresCatalog package which depends upon LibPQ, however it's not clear a user should need to know about PostgresCatalog, since it is an implementation detail. To map this onto the proposal, DataKnots fits P and the glue code would be an equivalent to CD where C is LibPQ and D is PostgresCatalog. Asking a user to know about LibPQ seems reasonable, given we want them to make a PostgreSQL connection first and then use it to build a DataKnot. However, PostgresCatalog is more of an implementation detail and doesn't currently show up in the documentation.. Further, what if we wish to change it or add additional dependencies later, it'd be nice to not have to break the package interface. I'd prefer glue code could import pending a single trigger package, such as LibPQ, but then specify additional implementation level dependencies, such as PostgresCatalog.

One last question -- would all the glue necessarily have to be in a directory called glue? Is it a subdirectory of src or parallel to src? Can this glue code module include other files in the glue directory? Could the referenced glue, e.g. CD, be a directory rather than a module? A bit more detail about this would be helpful, as some of our glue modules are quite involved.

Nov 29 '19 13:11 clarkevans

That scenario feels a lot heavier weight than what this is intended for, which is providing method definitions that only make sense in the presence of two or more interacting packages. As soon as you get into glue packages having their own extra dependencies, that's starting to feel a lot like a separate package entirely. In you scenario, what are the trigger packages for loading the glue? DataKnots and what else?

Nov 29 '19 13:11 StefanKarpinski

For this use case, besides DataKnots, the trigger package would just be LibPQ. In addition to PostgresCatalog, there is also a implementation detail dependency upon Tables, however, DataKnots also depends upon Tables. We wouldn't want DataKnots to depend directly upon PostgresCatalog since that code depends upon LibPQ. We pulled out PostgresCatalog since it is non-trivial and others may find it independently usable, or it could have just been a few included files.

Regardless, from the user perspective, it fits the use case: I want to use DataKnots on a PostgreSQL data source. Similar cases exist for MySQL, and dozens of other data sources, such as XML. These could all be separate packages, if they are in a single repository to manage code synchronization, we'd have DataKnots4X where X is Postgres, XML, and so on. Currently DataKnots4Postgres is only a single file but it will likely get quite larger as we implement SQL push-down. Even so, it seems this conditional dependency mechanism was relevant -- the glue enables DataKnot(PgSQL.Connection("")) (see docs).

Nov 29 '19 14:11 clarkevans

Perhaps I'm being a bit dense today, but I'm still not getting the dependency graph.

Nov 29 '19 15:11 StefanKarpinski

Pkg.jl Pkg.jl copied to clipboard

Proposal for first class support of conditional dependencies in Pkg

What is a conditional dependency

Current way of doing conditional dependencies

What is the problem with Requires.jl

The current proposal

How does declare a conditional dependency

Where should the glue code be stored?

How is the glue code loaded?

Pkg.jl
Pkg.jl copied to clipboard