deno_std
deno_std copied to clipboard
feat(archive): An implementation of Tar as streams
Following on from this pull request, https://github.com/denoland/deno_std/pull/4538, I didn't know how to revert the changes on that branch so just made a new one since the idea is that I'm creating a new API and not replacing one.
Usage Examples
import { TarStream } from '@std/archive/tar'
await ReadableStream.from([
{
pathname: 'potato/'
},
{
pathname: 'deno.json',
size: (await Deno.stat('deno.json')).size,
iterable: (await Deno.open('deno.json')).readable
},
{
pathname: 'deno.lock',
size: (await Deno.stat('deno.lock')).size,
iterable: (await Deno.open('deno.lock')).readable
}
])
.pipeThrough(new TarStream())
.pipeThrough(new CompressionStream('gzip'))
.pipeTo((await Deno.create('./out.tar.gz)).writable)
import { UnTarStream } from '@std/archive/tar'
for await (
const entry of (await Deno.open('./out.tar.gz'))
.readable
.pipeThrough(new DecompressionStream('gzip'))
.pipeThrough(new UnTarStream())
) {
console.log(entry.pathname)
entry
.readable?
.pipeTo((await Deno.create(entry.pathname)).writable)
}
From a quick skim, this PR shows promise, but there are opportunities to simplify and make the implementation more straightforward to understand. @crowlKats, can you see this as a viable alternative to #1985? If so, I'll go ahead and help polish this up.
... when comparing to the non-stream API and doesnt line up with other streams we do in CLI that do file or io handling.
With regards to this comment, I'd have assumed there was an intention to remove all Deno references, making it compatible with other runtimes and therefore not offer file or io handling.
Although that was 2y ago now so things may have changed.
that has nothing to do with what I have said. I am talking about streams behaviour being consistent across similar purpouses, in this case being byte streams instead of normal streams
@BlackAsLight, we're still determining whether to pursue this PR or #1985. We'll come back to you once a decision has been made.
Codecov Report
Attention: Patch coverage is 63.54916%
with 152 lines
in your changes are missing coverage. Please review.
Project coverage is 91.13%. Comparing base (
8c87b7f
) to head (dacaea5
).
Files | Patch % | Lines |
---|---|---|
archive/tar_stream.ts | 54.59% | 83 Missing and 1 partial :warning: |
archive/untar_stream.ts | 70.68% | 68 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #4548 +/- ##
==========================================
- Coverage 91.44% 91.13% -0.31%
==========================================
Files 480 482 +2
Lines 37324 37741 +417
Branches 5320 5391 +71
==========================================
+ Hits 34132 34397 +265
- Misses 3136 3287 +151
- Partials 56 57 +1
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Looking into what bytes streams are and how they work, I do think I could implement them into both streams. At least for when they're serving content out. I don't think I'd run into the same problem you're facing in implementing byte streams with it closing early or something. Although I am yet to actually test it.
On a side note the byte streams examples on MDN make use of both the byteRequest and enqueue for pushing, but my implementation only works on a pulling method so it would only ever be doing one or the other based off what the user provided in the options.
I don't really understand why, but for some reason when I added byte support for TarStream it consumed the text
variable from the tests instead of copying them. Even when { mode: 'byob' }
wasn't provided.
Adding byte support for the UnTarStream does seem doable from my understanding of it, but will be more complex.
Even though you haven't decided if you want to pursue my implementation or not. I'm still going to try and make the improvements in the mean time as I have the time.
After some playing around with this code, it seems to be occasionally and inconsistently creating bad tarballs with the same information provided. The UnTarStream class can correctly decode the bad tarball, but the tar implementation on my computer seems to sometimes fail to expand it or extract rubbish from it.
~~Anyone know why the Ubuntu environment produces a different result? I suspect its from one of these positions when a byobRequest is made but there is no more information to send.~~
Seems it was a problem with Github Action's Ubuntu environment.
Looking at the original issue, https://github.com/denoland/deno_std/issues/1658, they suggested the names TarEncoderStream
and TarDecoderStream
, would these be better names than TarStream
and UnTarStream
?
I think TarStream
and UntarStream
(not UnTarStream
) are best.
Codecov Report
Attention: Patch coverage is 91.88482%
with 31 lines
in your changes missing coverage. Please review.
Project coverage is 96.30%. Comparing base (
aa757d8
) to head (456ee81
). Report is 1 commits behind head on main.
Files with missing lines | Patch % | Lines |
---|---|---|
archive/untar_stream.ts | 86.84% | 19 Missing and 1 partial :warning: |
archive/tar_stream.ts | 95.21% | 11 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #4548 +/- ##
==========================================
- Coverage 96.34% 96.30% -0.05%
==========================================
Files 479 483 +4
Lines 38669 39085 +416
Branches 5630 5720 +90
==========================================
+ Hits 37255 37640 +385
- Misses 1370 1400 +30
- Partials 44 45 +1
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Since it has taken quite a while I did end up doing a rewrite and uploading it myself to JSR at @doctor/tar-stream, so if it's fine I'll copy those changes over here as I think that version is more readable and neater. That version also has a slight difference to this version by automatically enabling the size extension for file sizes 8 - 64 GiBs
SGTM 👍🏾
The ustar format is defined by POSIX here.
I've been using this spec for the above implementation. Idk how much, if at all, it varies from that one.
I don't think there are any more tests I could make to cover additional lines of code
- I haven't yet checked, but does this implementation copy tests from the existing implementation (as streams) and #1985.
I did have a look at #1985, but couldn't see any tests with logic that wasn't being tested via existing tests.
- If there's logic which can't be tested, it's usually an indication that said logic might not be necessary. Do we definitely need said logic?
The only lines which Sentry says isn't covered isn't accessible via outside the TarStream and UnTarStream. But that code, for example, knowing whether to use 11 or 12 bytes of space for the size, is necessary.
Can you please rebase? The doc linter now checks @std/archive
since #4462. So we'll have to fix a bunch of documentation holes. In other words, deno task lint:docs
will have to pass. Either way, it seems we're getting close to landing this.
@BlackAsLight By the way, thank you for your efforts in this PR: I've been watching the status of conversations around streaming Tar in std for a long time and have been looking forward to the addition of an implementation.