deno_std icon indicating copy to clipboard operation
deno_std copied to clipboard

feat(archive): An implementation of Tar as streams

Open BlackAsLight opened this issue 10 months ago • 18 comments

Following on from this pull request, https://github.com/denoland/deno_std/pull/4538, I didn't know how to revert the changes on that branch so just made a new one since the idea is that I'm creating a new API and not replacing one.

Usage Examples

import { TarStream } from '@std/archive/tar'

await ReadableStream.from([
  {
    pathname: 'potato/'
  },
  {
    pathname: 'deno.json',
    size: (await Deno.stat('deno.json')).size,
    iterable: (await Deno.open('deno.json')).readable
  },
  {
    pathname: 'deno.lock',
    size: (await Deno.stat('deno.lock')).size,
    iterable: (await Deno.open('deno.lock')).readable
  }
])
  .pipeThrough(new TarStream())
  .pipeThrough(new CompressionStream('gzip'))
  .pipeTo((await Deno.create('./out.tar.gz)).writable)
import { UnTarStream } from '@std/archive/tar'

for await (
  const entry of (await Deno.open('./out.tar.gz'))
    .readable
    .pipeThrough(new DecompressionStream('gzip'))
    .pipeThrough(new UnTarStream())
) {
  console.log(entry.pathname)
  entry
    .readable?
    .pipeTo((await Deno.create(entry.pathname)).writable)
}

BlackAsLight avatar Apr 04 '24 11:04 BlackAsLight

From a quick skim, this PR shows promise, but there are opportunities to simplify and make the implementation more straightforward to understand. @crowlKats, can you see this as a viable alternative to #1985? If so, I'll go ahead and help polish this up.

iuioiua avatar Apr 08 '24 09:04 iuioiua

... when comparing to the non-stream API and doesnt line up with other streams we do in CLI that do file or io handling.

With regards to this comment, I'd have assumed there was an intention to remove all Deno references, making it compatible with other runtimes and therefore not offer file or io handling.

Although that was 2y ago now so things may have changed.

BlackAsLight avatar Apr 09 '24 06:04 BlackAsLight

that has nothing to do with what I have said. I am talking about streams behaviour being consistent across similar purpouses, in this case being byte streams instead of normal streams

crowlKats avatar Apr 09 '24 06:04 crowlKats

@BlackAsLight, we're still determining whether to pursue this PR or #1985. We'll come back to you once a decision has been made.

iuioiua avatar Apr 10 '24 04:04 iuioiua

Codecov Report

Attention: Patch coverage is 63.54916% with 152 lines in your changes are missing coverage. Please review.

Project coverage is 91.13%. Comparing base (8c87b7f) to head (dacaea5).

Files Patch % Lines
archive/tar_stream.ts 54.59% 83 Missing and 1 partial :warning:
archive/untar_stream.ts 70.68% 68 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4548      +/-   ##
==========================================
- Coverage   91.44%   91.13%   -0.31%     
==========================================
  Files         480      482       +2     
  Lines       37324    37741     +417     
  Branches     5320     5391      +71     
==========================================
+ Hits        34132    34397     +265     
- Misses       3136     3287     +151     
- Partials       56       57       +1     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Apr 12 '24 08:04 codecov[bot]

Looking into what bytes streams are and how they work, I do think I could implement them into both streams. At least for when they're serving content out. I don't think I'd run into the same problem you're facing in implementing byte streams with it closing early or something. Although I am yet to actually test it.

On a side note the byte streams examples on MDN make use of both the byteRequest and enqueue for pushing, but my implementation only works on a pulling method so it would only ever be doing one or the other based off what the user provided in the options.

BlackAsLight avatar Apr 19 '24 07:04 BlackAsLight

I don't really understand why, but for some reason when I added byte support for TarStream it consumed the text variable from the tests instead of copying them. Even when { mode: 'byob' } wasn't provided.

Adding byte support for the UnTarStream does seem doable from my understanding of it, but will be more complex.

Even though you haven't decided if you want to pursue my implementation or not. I'm still going to try and make the improvements in the mean time as I have the time.

BlackAsLight avatar Apr 20 '24 13:04 BlackAsLight

After some playing around with this code, it seems to be occasionally and inconsistently creating bad tarballs with the same information provided. The UnTarStream class can correctly decode the bad tarball, but the tar implementation on my computer seems to sometimes fail to expand it or extract rubbish from it.

BlackAsLight avatar Apr 22 '24 08:04 BlackAsLight

~~Anyone know why the Ubuntu environment produces a different result? I suspect its from one of these positions when a byobRequest is made but there is no more information to send.~~

Seems it was a problem with Github Action's Ubuntu environment.

BlackAsLight avatar Apr 23 '24 06:04 BlackAsLight

Looking at the original issue, https://github.com/denoland/deno_std/issues/1658, they suggested the names TarEncoderStream and TarDecoderStream, would these be better names than TarStream and UnTarStream?

BlackAsLight avatar May 12 '24 02:05 BlackAsLight

I think TarStream and UntarStream (not UnTarStream) are best.

iuioiua avatar May 13 '24 06:05 iuioiua

Codecov Report

Attention: Patch coverage is 91.88482% with 31 lines in your changes missing coverage. Please review.

Project coverage is 96.30%. Comparing base (aa757d8) to head (456ee81). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
archive/untar_stream.ts 86.84% 19 Missing and 1 partial :warning:
archive/tar_stream.ts 95.21% 11 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4548      +/-   ##
==========================================
- Coverage   96.34%   96.30%   -0.05%     
==========================================
  Files         479      483       +4     
  Lines       38669    39085     +416     
  Branches     5630     5720      +90     
==========================================
+ Hits        37255    37640     +385     
- Misses       1370     1400      +30     
- Partials       44       45       +1     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar May 24 '24 05:05 codecov-commenter

Since it has taken quite a while I did end up doing a rewrite and uploading it myself to JSR at @doctor/tar-stream, so if it's fine I'll copy those changes over here as I think that version is more readable and neater. That version also has a slight difference to this version by automatically enabling the size extension for file sizes 8 - 64 GiBs

BlackAsLight avatar Aug 05 '24 08:08 BlackAsLight

SGTM 👍🏾

iuioiua avatar Aug 05 '24 10:08 iuioiua

The ustar format is defined by POSIX here.

I've been using this spec for the above implementation. Idk how much, if at all, it varies from that one.

BlackAsLight avatar Aug 06 '24 23:08 BlackAsLight

I don't think there are any more tests I could make to cover additional lines of code

BlackAsLight avatar Aug 07 '24 06:08 BlackAsLight

  1. I haven't yet checked, but does this implementation copy tests from the existing implementation (as streams) and #1985.

I did have a look at #1985, but couldn't see any tests with logic that wasn't being tested via existing tests.

  1. If there's logic which can't be tested, it's usually an indication that said logic might not be necessary. Do we definitely need said logic?

The only lines which Sentry says isn't covered isn't accessible via outside the TarStream and UnTarStream. But that code, for example, knowing whether to use 11 or 12 bytes of space for the size, is necessary.

BlackAsLight avatar Aug 07 '24 10:08 BlackAsLight

Can you please rebase? The doc linter now checks @std/archive since #4462. So we'll have to fix a bunch of documentation holes. In other words, deno task lint:docs will have to pass. Either way, it seems we're getting close to landing this.

iuioiua avatar Aug 20 '24 23:08 iuioiua

@BlackAsLight By the way, thank you for your efforts in this PR: I've been watching the status of conversations around streaming Tar in std for a long time and have been looking forward to the addition of an implementation.

jsejcksn avatar Aug 25 '24 16:08 jsejcksn