RFC: Add `RelocPath` type for userdefined relocatable paths
RelocPath(path) works by splitting path as depot/subpath based on DEPOT_PATH during construction.
When using RelocPath one explicitly opts into relocation, so it throws when path cannot be split.
A relocated path can be queried with String(r::RelocPath), which also throws on failure.
Fixes #54430 Alternative to #55146
What's the idiomatic usage pattern for this look like?
The example where @__RELOCDIR__ fails https://github.com/JuliaLang/julia/pull/55146#issuecomment-2302409741 would become now:
module MyPkg
const data = RelocPath(joinpath(@__DIR__, "myfile.data"))
function read_data()
path = String(data)
# load from path ...
end
end
Is this how we get a Path type finally. I'd love that
~~It would be nice to get this into Julia 1.12.~~
EDIT: Given the timeline, I don't think this can make it into 1.12. Maybe we can merge it into master early in 1.13, so that it gets some time to be tested out.
Should we have a way of constructing a RelocPath by just providing the subpath?
Is this how we get a Path type finally. I'd love that
FWIW there is already https://juliahub.com/ui/Packages/General/FilePathsBase and https://juliahub.com/ui/Packages/General/FilePaths which provide such types.
Should we have a way of constructing a RelocPath by just providing the subpath?
I am not sure.
@vchuravy @gbaraldi and @topolarity Would you be able to review this PR?
As far as I can tell, this PR is an improvement compared to #55146, but I'm not particularly familiar with the relevant part of the codebase.
@staticfloat IIRC, you also worked on the depot-relocatibility code; it would be great if you could review this PR.
Is this how we get a Path type finally. I'd love that
The RelocPath constructor will error if the path isn't inside a depot, right? So IIUC this doesn't really cover the general "path type" request.
Should we just call this DepotPath? I think that might be a clearer name for users
Should we just call this DepotPath? I think that might be a clearer name for users
No strong feelings from my side.
I also don't feel strongly between RelocPath and DepotPath.
We could also do RelocatablePath if we wanted something a little more unambiguous.
xref: https://discourse.julialang.org/t/designing-a-paths-julep/124335
Specifically the p"@/... syntax that is intended to create a relocatable project-relative path (currently only in the design doc, not yet implemented).
Specifically the
p"@/...syntax that is intended to create a relocatable project-relative path (currently only in the design doc, not yet implemented).
Hmmm, that sounds slightly different that the RelocPath in this PR, right? The RelocPath in this PR is depot-relative (not project-relative).
@fatteneder Can you rebase and resolve the merge conflicts?
Hmmm, that sounds slightly different that the RelocPath in this PR, right? The RelocPath in this PR is depot-relative (not project-relative).
Functionally, there is a difference in the approaches taken here/proposed, but as far as I can tell there's very much overlap in the intended usage. Both fix the linked issue, and in this respect are alternatives.
Hmmm, that sounds slightly different that the RelocPath in this PR, right? The RelocPath in this PR is depot-relative (not project-relative).
Functionally, there is a difference in the approaches taken here/proposed, but as far as I can tell there's very much overlap in the intended usage. Both fix the linked issue, and in this respect are alternatives.
I'm not sure I follow. I think I am probably misunderstanding what you mean by "project-relative path".
I'll try to highlight my confusion. Here's an example use case. Suppose you have two packages, PkgA and PkgB.
In the Git repo for PkgA, there is a file named PkgA/data/stuff.data. This file is some data that I ship in the PkgA repo.
In the Git repo for PkgA, this is PkgA/src/PkgA.jl:
module PkgA
const data = RelocPath(joinpath(dirname(@__DIR__), "data", "stuff.data")) # PkgA/data/myfile.data
function f()
path = String(data)
# load from path ...
end
end # module
This is based on the example posted in https://github.com/JuliaLang/julia/pull/55146#issuecomment-2302409741 and https://github.com/JuliaLang/julia/pull/56053#issuecomment-2402576269.
In the Git repo for PkgB, this is PkgB/Project.toml:
name = "PkgB"
[deps]
PkgA = "..."
In the Git repo for PkgB, this is PkgB/src/PkgB.jl:
module PkgB
import PkgA
function g()
# ...
x = PkgA.f()
* ...
end
end # module
Suppose that I am running on machine_1.
On machine_1, my Julia depot is /home1/user1/foo/bar/mydepot.
On machine_1, I create a new Julia project at /home1/user1/hello/world/myproj/Project.toml:
# myproj/Project.toml
[deps]
PkgB = "..."
Now I build myproj into a custom sysimage. I want to give this custom sysimage to a different user, who will run it on their machine_2. So I put the sysimage in a tarball; I also put my Julia depot into that same tarball, because the user will need it.
So the user extracts the tarball on their machine_2, which obviously won't have the same paths as machine_1. Suppose they end up with the depot extracted to /home2/user2/blah/blah/blah/mydepot, and the project extracted to /home2/user2/various/things/here/myproj/Project.toml.
The user makes sure that /home2/user2/blah/blah/blah/mydepot is in their JULIA_DEPOT_PATH, and then they start Julia with the custom sysimage. And now they want to run the PkgB.g() function, which means that PkgA.f() will end up getting called. So PkgA needs to be able to find the PkgA/data/stuff.data file.
If we hadn't used RelocPath(), the old absolute path of this file on machine_1 (/home1/user1/foo/bar/mydepot/packages/PkgA/slug/data/stuff.data) would have been baked into the custom sysimage.
But we used RelocPath(). So, when PkgA.f() is run, the path = String(data::RelocParth) line will look through the depot path, and it will find that the desired file exists on machine_2 at location /home2/user2/blah/blah/blah/mydepot/packages/PkgA/slug/data/stuff.data.
So, if we use RelocPath() as implemented in this PR (depot-relative), then we find our file, and everything works.
Would the proposed p"@/... syntax have the same behavior, i.e. would it also find the file at the new location (/home2/user2/blah/blah/blah/mydepot/packages/PkgA/slug/data/stuff.data)?
My interpretation of the phrase "project-relative path" was that p"@... would be looking relative to the active project (/home1/user1/hello/world/myproj/Project.toml on the old machine, /home2/user2/various/things/here/myproj/Project.toml on the new machine). In this example, the directory containing the active project has no relationship to the stuff.data file on either the old machine or the new machine, so looking relative to the active project wouldn't yield any fruit. So that's the source of my confusion.
@fatteneder In the current implementation in this PR, in the String(::RelocPath) method, as soon as we find a path where the subpath matches, we accept that path and return.
It occurs to me that, for correctness purposes, we should have more stringent requirements. Instead of only requiring that the subpath matches, I propose that we first check if the subpath matches, and then if the subpath matches, we also require that the checksum matches. If the checksum doesn't match, we reject the path.
This means that at RelocPath() construction time, we have to compute the checksum, and thus the RelocPath struct needs a field for storing the checksum:
struct RelocPath
subpath::String
checksum::Union{String, Nothing}
end
When constructing a RelocPath, if the input path is a file, then we just checksum the file. If the input path is a directory, we have to compute the hash of the directory - probably something like this:
artifact_hash = SHA1(GitTools.tree_hash(the_directory))
Which is taken from this section of Pkg.Artifacts.
If the input path is neither a file nor a directory (e.g. it's a symlink), then tbh idk - throw an error maybe? I don't know if that's a real use case that we need to support, at least not in the initial implementation.
I think we should allow opting out of this behavior at two different locations:
- At
RelocPathconstruction time, the user can opt-out by doingRelocPath("..."; record_checksum = false). In this case, the newly constructedRelocPathobject will have.checksum = nothingstored. For such aRelocPath, wheneverString(::RelocPath)is called, we don't compute or verify any checksums, we just accept (and return) the path as soon as we find a path where thesubpathmatches - in other words, the current behavior of this PR.- The default value of the kwarg would be
record_checksum = true.
- The default value of the kwarg would be
- At the
String(::RelocPath)callsite. In this case, the user can opt-out by doingString(reloc_path::RelocPath; verify_checksum = false). And ifverify_checksum = false, then we don't compute or verify any checksums.- The default value of the kwarg would be
verify_checksum = true.
- The default value of the kwarg would be
What do you think? I feel like this would increase our confidence in the correctness of the file or directory that we return.
I think this makes senses.
For include(), include_dependency() we already use the same logic for checksum computation, but with Base only functions: hash = isdir(path) ? _crc32c(join(readdir(path))) : open(_crc32c, path, "r"). I think this also works with symlinks, as they are first opened, although we do not have a test for that.
I would go for RelocPath(; track_content), like include_dependency, and then maybe String(; verify_content).
We might also want to promote the object that RelocPath(; track_content=true) is referring to to an include_dependency, so that when PkgA/data/stuff.data changes we recompile PkgA to update the cached hash.
Would the proposed p"@/... syntax have the same behavior, i.e. would it also find the file at the new location (/home2/user2/blah/blah/blah/mydepot/packages/PkgA/slug/data/stuff.data)?
The current idea is that it is relative to the parent project of the code file it occurs in. E.g. in PkgA/src/file.jl instead of doing
RelocPath(joinpath(dirname(@__DIR__), "data", "stuff.data"))
you do
p"@/data/stuf.data"
And the current intent (the Paths proposal is still being written), is that these would behave the same way.
We might also want to promote the object that
RelocPath(; track_content=true)is referring to to aninclude_dependency, so that whenPkgA/data/stuff.datachanges we recompilePkgAto update the cached hash.
I'm not sure that recompilation is what we want here, particularly in the sysimage case. I think we should just throw an error if there is a hash mismatch.
I now implemented a track_content::Bool keyword.
I agreed with Timothy that a future path type could easily wrap around this RelocPath by giving it a suitable supertype and hide it behind a string macro.
What remains to be done:
- Decide on a name, we these suggestions:
RelocPath, DepotPath, RelocatablePath - Jameson asked for using
OncePerProcessabove, can't say much about it, someone else needs to judge if its worth.