zstd-ruby icon indicating copy to clipboard operation
zstd-ruby copied to clipboard

[feature request] Streaming wrapper

Open shaicoleman opened this issue 2 years ago • 4 comments
trafficstars

I'm trying to nest streams in order to have streaming compressed uploads/downloads of large files with S3

With zlib, I could use Zlib::GzipWriter.wrap / Zlib::GzipReader.wrap, but there doesn't seem to be an equivalent for zstd-ruby.

Example zlib code, which I would like to migrate to zstd-ruby:

require 'aws-sdk-s3'
require 'zlib'

s3_bucket = 'my_s3_bucket'
s3_key = 'my_s3_key'
filename = '/my/path'

s3_object = Aws::S3::Object.new(s3_bucket, s3_key)
s3_object.upload_stream do |s3_stream|
  Zlib::GzipWriter.wrap(s3_stream, ::Zlib::BEST_COMPRESSION) do |gz|
    File.open(filename) do |f|
      IO.copy_stream(f, gz)
    end
  end
end

Zlib::GzipWriter.wrap / Zlib::GzipReader.wrap documentation: https://ruby-doc.org/3.2.2/exts/zlib/Zlib/GzipFile.html#method-c-wrap

For similar functionality, I currently need to manually chunk the file, e.g.

CHUNK_SIZE = 1024 * 1024
zstd_stream = Zstd::StreamingCompress.new
s3_object = Aws::S3::Object.new(@bucket, key)
s3_object.upload_stream do |s3_stream|
  s3_stream << zstd_stream.compress('')
  File.open(filename) do |file|
    while (chunk = file.read(CHUNK_SIZE))
      s3_stream << zstd_stream.compress(chunk)
    end
  end
  s3_stream << zstd_stream.finish
end

shaicoleman avatar May 17 '23 21:05 shaicoleman

Thanks for the request. I have not been able to find the time to implement it now. Sorry about that. I am planning to respond around the end of the year.

SpringMT avatar Sep 21 '23 02:09 SpringMT

As an experimental feature, I implemented a simple StreamWriter and StreamReader. https://github.com/SpringMT/zstd-ruby/pull/76

Write

irb(main):001> require 'zstd-ruby'
=> true
irb(main):002> require 'zstd-ruby/stream_writer'
=> true
irb(main):003> require 'stringio'
=> true
irb(main):004> io = StringIO.new
=> #<StringIO:0x0000000107a90d50>
irb(main):005> w = Zstd::StreamWriter.new(io)
=> #<Zstd::StreamWriter:0x0000000109831670 @io=#<StringIO:0x0000000107a90d50>, @stream=#<Zstd::StreamingCompress:0x00000001098315d0>>
irb(main):006> w.write("abc")
=> 12
irb(main):007> w.write("def")
=> 6
irb(main):008> w.finish
=> 3
irb(main):009> io.rewind
=> 0
irb(main):010> puts Zstd.decompress(io.read)
abcdef
=> nil

Read

irb(main):001> require 'zstd-ruby'
=> true
irb(main):002> require 'zstd-ruby/stream_writer'
=> true
irb(main):003> require 'zstd-ruby/stream_reader'
=> true
irb(main):004> require 'stringio'
=> true
irb(main):005> io = StringIO.new
=> #<StringIO:0x0000000108699ac0>
irb(main):006> writer = Zstd::StreamWriter.new(io)
=> #<Zstd::StreamWriter:0x000000010834b4b8 @io=#<StringIO:0x0000000108699ac0>, @stream=#<Zstd::StreamingCompress:0x000000010834b3c8>>
irb(main):007> writer.write("abc")
=> 12
irb(main):008> writer.finish
=> 3
irb(main):009> io.rewind
=> 0
irb(main):010> reader = Zstd::StreamReader.new(io)
=> #<Zstd::StreamReader:0x0000000109a30638 @io=#<StringIO:0x0000000108699ac0>, @stream=#<Zstd::StreamingDecompress:0x0000000109a30548>>
irb(main):011> reader.read(10)
=> "a"
irb(main):012> reader.read(10)
=> "bc"
irb(main):013> reader.read(10)
/Users/springmt/zstd-ruby/lib/zstd-ruby/stream_reader.rb:11:in `read': EOF (StandardError)

SpringMT avatar Apr 03 '24 06:04 SpringMT

Add documents for README https://github.com/SpringMT/zstd-ruby/edit/main/README.md#stream-writer-and-reader-wrapper

SpringMT avatar Apr 26 '24 06:04 SpringMT

Thanks for implementing this, but it doesn't seem to support the block form and it doesn't have automatic chunking and automatic flushing like in Zlib::GzipWriter.wrap and Zlib::GzipReader.wrap.

Here are simplified examples using zlib and StringIO. I was hoping that it would work in the same way.

require 'zlib'
require 'stringio'

source_iostr = StringIO.new('This is a test')
compressed_iostr = StringIO.new
Zlib::GzipWriter.wrap(compressed_iostr) do |gz|
  IO.copy_stream(source_iostr, gz)
end

compressed_iostr = StringIO.new(compressed_iostr.string)
extracted_iostr = StringIO.new
Zlib::GzipReader.wrap(compressed_iostr) do |gz|
  IO.copy_stream(gz, extracted_iostr)
end

shaicoleman avatar Apr 30 '24 00:04 shaicoleman