Tar.jl icon indicating copy to clipboard operation
Tar.jl copied to clipboard

Add copy_symlinks keyword to Tar.tree_hash

Open mkitti opened this issue 1 year ago • 9 comments

Implement copy_symlinks keyword for Tar.tree_hash

For https://github.com/JuliaLang/Pkg.jl/issues/3643 https://github.com/JuliaBinaryWrappers/P4est_jll.jl/releases/download/P4est-v2.8.1+2/P4est.v2.8.1.x86_64-w64-mingw32-mpi+microsoftmpi.tar.gz

julia> open(GzipDecompressorStream, "P4est.v2.8.1.x86_64-w64-mingw32-mpi+microsoftmpi.tar.gz") do io
           Tar.tree_hash(io, copy_symlinks=false)
       end
"89a337ea6f60a4fd58999ab73dea099e41032138"

julia> open(GzipDecompressorStream, "P4est.v2.8.1.x86_64-w64-mingw32-mpi+microsoftmpi.tar.gz") do io
           Tar.tree_hash(io, copy_symlinks=true)
       end
"ed75b82e0dd9b53c4ac4e70376f3e6f330c72767"

For https://github.com/JuliaPackaging/Yggdrasil/issues/7888 https://github.com/JuliaBinaryWrappers/iso_codes_jll.jl/releases/download/iso_codes-v4.11.0+0/iso_codes.v4.11.0.any.tar.gz

julia> open(GzipDecompressorStream, "iso_codes.v4.11.0.any.tar.gz") do io
           Tar.tree_hash(io, copy_symlinks=false)
       end
"71f68a3d55d73f2e15a3969c241fae2349b1feb5"

julia> open(GzipDecompressorStream, "iso_codes.v4.11.0.any.tar.gz") do io
           Tar.tree_hash(io, copy_symlinks=true)
       end
"409d6ac4c02dae43ff4fe576b5c5820d0386fb3f"

mkitti avatar Jan 08 '24 10:01 mkitti

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (6269b5b) 97.28% compared to head (0bcb514) 97.64%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #167      +/-   ##
==========================================
+ Coverage   97.28%   97.64%   +0.36%     
==========================================
  Files           4        4              
  Lines         810      851      +41     
==========================================
+ Hits          788      831      +43     
+ Misses         22       20       -2     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jan 08 '24 10:01 codecov[bot]

Can the julia compat and CI tests be bumped to 1.6?

nhz2 avatar Jan 13 '24 18:01 nhz2

Adding the following test results in a stack overflow.

    tarball, io = mktemp()
    tar_write_link(io, "a", "b/q")
    tar_write_link(io, "b", "a/q")
    close(io)
    Tar.tree_hash(tarball; copy_symlinks=true)
StackOverflowError:
  Stacktrace:
    [1] exec(re::Ptr{Nothing}, subject::String, offset::Int64, options::UInt32, match_data::Ptr{Nothing})
      @ Base.PCRE ./pcre.jl:203
    [2] exec_r_data
      @ Base.PCRE ./pcre.jl:220 [inlined]
    [3] _findnext_re(re::Regex, str::String, idx::Int64, match_data::Ptr{Nothing})
      @ Base ./regex.jl:442
    [4] findnext
      @ Base ./regex.jl:431 [inlined]
    [5] iterate
      @ Base ./strings/util.jl:556 [inlined]
    [6] iterate
      @ Base ./strings/util.jl:555 [inlined]
    [7] _collect(cont::UnitRange{Int64}, itr::Base.SplitIterator{String, Regex}, ::Base.HasEltype, isz::Base.SizeUnknown)
      @ Base ./array.jl:770
    [8] collect
      @ ./array.jl:759 [inlined]
    [9] #split#487
      @ ./strings/util.jl:628 [inlined]
   [10] split
      @ ./strings/util.jl:626 [inlined]
   [11] link_target(paths::Dict{String, Any}, path::String, link::String)
      @ Tar ~/github/Tar.jl/src/extract.jl:175
   [12] link_target(paths::Dict{String, Any}, path::String, link::String) (repeats 13767 times)
      @ Tar ~/github/Tar.jl/src/extract.jl:193
   [13] git_tree_hash(predicate::Function, tar::IOStream, ::Type{SHA.SHA1_CTX}, skip_empty::Bool, copy_symlinks::Bool; buf::Vector{UInt8})
      @ Tar ~/github/Tar.jl/src/extract.jl:264
   [14] git_tree_hash
      @ ~/github/Tar.jl/src/extract.jl:208 [inlined]
   [15] #98
      @ ~/github/Tar.jl/src/Tar.jl:408 [inlined]
   [16] open(f::Tar.var"#98#100"{Bool, Bool, Tar.var"#1#2"}, args::String; kwargs::@Kwargs{lock::Bool})
      @ Base ./io.jl:396
   [17] open_nolock
      @ ~/packages/julias/julia-1.10/share/julia/stdlib/v1.10/ArgTools/src/ArgTools.jl:35 [inlined]
   [18] arg_read
      @ ~/packages/julias/julia-1.10/share/julia/stdlib/v1.10/ArgTools/src/ArgTools.jl:74 [inlined]
   [19] tree_hash(predicate::Function, tarball::String; algorithm::String, skip_empty::Bool, copy_symlinks::Bool)
      @ Tar ~/github/Tar.jl/src/Tar.jl:407
   [20] tree_hash
      @ Tar ~/github/Tar.jl/src/Tar.jl:398 [inlined]
   [21] #tree_hash#102
      @ Tar ~/github/Tar.jl/src/Tar.jl:425 [inlined]
   [22] macro expansion
      @ ~/github/Tar.jl/test/runtests.jl:12 [inlined]
   [23] macro expansion
      @ ~/packages/julias/julia-1.10/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
   [24] top-level scope
      @ ~/github/Tar.jl/test/runtests.jl:4
   [25] include(fname::String)
      @ Base.MainInclude ./client.jl:489
   [26] top-level scope
      @ none:6
   [27] eval
      @ Core ./boot.jl:385 [inlined]
   [28] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:291
   [29] _start()
      @ Base ./client.jl:552
Test Summary:   | Error  Total  Time
copy symlinks 2 |     1      1  1.6s
ERROR: LoadError: Some tests did not pass: 0 passed, 0 failed, 1 errored, 0 broken.
in expression starting at /home/nathan/github/Tar.jl/test/runtests.jl:3
ERROR: Package Tar errored during testing

nhz2 avatar Jan 13 '24 19:01 nhz2

The following test results in a stack overflow from a different function:

    tarball, io = mktemp()
    tar_write_dir(io,  "foo/bar")
    tar_write_link(io, "foo/bar/b", "../a")
    tar_write_link(io, "foo/a", "bar/")
    close(io)
    Tar.tree_hash(tarball; copy_symlinks=true)
StackOverflowError:
  Stacktrace:
       [1] (::Tar.var"#34#44"{Dict{String, Any}, Tar.var"#by#43"})(io::IOBuffer)
         @ Tar ~/github/Tar.jl/src/extract.jl:329
       [2] sprint(::Function; context::Nothing, sizehint::Int64)
         @ Base ./strings/io.jl:114
       [3] sprint
         @ ./strings/io.jl:107 [inlined]
       [4] git_object_hash(emit::Function, kind::String, ::Type{SHA.SHA1_CTX})
         @ Tar ~/github/Tar.jl/src/extract.jl:347
       [5] (::Tar.var"#hash_tree#42"{SHA.SHA1_CTX})(node::Dict{String, Any})
         @ Tar ~/github/Tar.jl/src/extract.jl:328
       [6] (::Tar.var"#34#44"{Dict{String, Any}, Tar.var"#by#43"})(io::IOBuffer)
         @ Tar ~/github/Tar.jl/src/extract.jl:330
  --- the last 5 lines are repeated 6881 more times ---
   [34412] sprint(::Function; context::Nothing, sizehint::Int64)
         @ Base ./strings/io.jl:114
   [34413] sprint
         @ ./strings/io.jl:107 [inlined]
   [34414] git_object_hash(emit::Function, kind::String, ::Type{SHA.SHA1_CTX})
         @ Tar ~/github/Tar.jl/src/extract.jl:347
   [34415] (::Tar.var"#hash_tree#42"{SHA.SHA1_CTX})(node::Dict{String, Any})
         @ Tar ~/github/Tar.jl/src/extract.jl:328
   [34416] git_tree_hash(predicate::Function, tar::IOStream, ::Type{SHA.SHA1_CTX}, skip_empty::Bool, copy_symlinks::Bool; buf::Vector{UInt8})
         @ Tar ~/github/Tar.jl/src/extract.jl:338
   [34417] git_tree_hash
         @ ~/github/Tar.jl/src/extract.jl:209 [inlined]
   [34418] #98
         @ ~/github/Tar.jl/src/Tar.jl:408 [inlined]
   [34419] open(f::Tar.var"#98#100"{Bool, Bool, Tar.var"#1#2"}, args::String; kwargs::@Kwargs{lock::Bool})
         @ Base ./io.jl:396
   [34420] open_nolock
         @ ~/packages/julias/julia-1.10/share/julia/stdlib/v1.10/ArgTools/src/ArgTools.jl:35 [inlined]
   [34421] arg_read
         @ ~/packages/julias/julia-1.10/share/julia/stdlib/v1.10/ArgTools/src/ArgTools.jl:74 [inlined]
   [34422] tree_hash(predicate::Function, tarball::String; algorithm::String, skip_empty::Bool, copy_symlinks::Bool)
         @ Tar ~/github/Tar.jl/src/Tar.jl:407
   [34423] tree_hash
         @ Tar ~/github/Tar.jl/src/Tar.jl:398 [inlined]
   [34424] #tree_hash#102
         @ Tar ~/github/Tar.jl/src/Tar.jl:425 [inlined]
   [34425] macro expansion
         @ ~/github/Tar.jl/test/runtests.jl:13 [inlined]
   [34426] macro expansion
         @ ~/packages/julias/julia-1.10/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
   [34427] top-level scope
         @ ~/github/Tar.jl/test/runtests.jl:6
   [34428] include(fname::String)
         @ Base.MainInclude ./client.jl:489
   [34429] top-level scope
         @ none:6
   [34430] eval
         @ Core ./boot.jl:385 [inlined]
   [34431] exec_options(opts::Base.JLOptions)
         @ Base ./client.jl:291
Test Summary:   | Error  Total   Time
copy symlinks 2 |     1      1  59.1s
ERROR: LoadError: Some tests did not pass: 0 passed, 0 failed, 1 errored, 0 broken.
in expression starting at /home/nathan/github/Tar.jl/test/runtests.jl:3
ERROR: Package Tar errored during testing

<\details>

nhz2 avatar Jan 13 '24 20:01 nhz2

What does extract do in this circumstance?

mkitti avatar Jan 13 '24 23:01 mkitti

extract has a stack overflow in the first case and goes into an infinite loop in the second case.

nhz2 avatar Jan 14 '24 17:01 nhz2

We should determine the correct behavior for extract first. My sense is that it should produce a more intuitive and descriptive error.

mkitti avatar Jan 16 '24 15:01 mkitti

That makes sense, though sometimes extract will ignore symlinks that it cannot copy. This might be the same issue as https://github.com/JuliaIO/Tar.jl/issues/77 . Maybe these issues could be resolved while doing the refactoring described in https://github.com/JuliaIO/Tar.jl/issues/64

nhz2 avatar Jan 16 '24 16:01 nhz2

Yeah, this is a known issue for extract: https://github.com/JuliaIO/Tar.jl/issues/77. I just never got around to fixing it because it seems deeply unlikely in real-world tarballs, but of course, we should handle it correctly. But it's very annoying to fix.

StefanKarpinski avatar Jan 16 '24 20:01 StefanKarpinski