zstd-ruby
zstd-ruby copied to clipboard
[feature request] Streaming wrapper
I'm trying to nest streams in order to have streaming compressed uploads/downloads of large files with S3
With zlib, I could use Zlib::GzipWriter.wrap / Zlib::GzipReader.wrap, but there doesn't seem to be an equivalent for zstd-ruby.
Example zlib code, which I would like to migrate to zstd-ruby:
require 'aws-sdk-s3'
require 'zlib'
s3_bucket = 'my_s3_bucket'
s3_key = 'my_s3_key'
filename = '/my/path'
s3_object = Aws::S3::Object.new(s3_bucket, s3_key)
s3_object.upload_stream do |s3_stream|
Zlib::GzipWriter.wrap(s3_stream, ::Zlib::BEST_COMPRESSION) do |gz|
File.open(filename) do |f|
IO.copy_stream(f, gz)
end
end
end
Zlib::GzipWriter.wrap / Zlib::GzipReader.wrap documentation:
https://ruby-doc.org/3.2.2/exts/zlib/Zlib/GzipFile.html#method-c-wrap
For similar functionality, I currently need to manually chunk the file, e.g.
CHUNK_SIZE = 1024 * 1024
zstd_stream = Zstd::StreamingCompress.new
s3_object = Aws::S3::Object.new(@bucket, key)
s3_object.upload_stream do |s3_stream|
s3_stream << zstd_stream.compress('')
File.open(filename) do |file|
while (chunk = file.read(CHUNK_SIZE))
s3_stream << zstd_stream.compress(chunk)
end
end
s3_stream << zstd_stream.finish
end
Thanks for the request. I have not been able to find the time to implement it now. Sorry about that. I am planning to respond around the end of the year.
As an experimental feature, I implemented a simple StreamWriter and StreamReader. https://github.com/SpringMT/zstd-ruby/pull/76
Write
irb(main):001> require 'zstd-ruby'
=> true
irb(main):002> require 'zstd-ruby/stream_writer'
=> true
irb(main):003> require 'stringio'
=> true
irb(main):004> io = StringIO.new
=> #<StringIO:0x0000000107a90d50>
irb(main):005> w = Zstd::StreamWriter.new(io)
=> #<Zstd::StreamWriter:0x0000000109831670 @io=#<StringIO:0x0000000107a90d50>, @stream=#<Zstd::StreamingCompress:0x00000001098315d0>>
irb(main):006> w.write("abc")
=> 12
irb(main):007> w.write("def")
=> 6
irb(main):008> w.finish
=> 3
irb(main):009> io.rewind
=> 0
irb(main):010> puts Zstd.decompress(io.read)
abcdef
=> nil
Read
irb(main):001> require 'zstd-ruby'
=> true
irb(main):002> require 'zstd-ruby/stream_writer'
=> true
irb(main):003> require 'zstd-ruby/stream_reader'
=> true
irb(main):004> require 'stringio'
=> true
irb(main):005> io = StringIO.new
=> #<StringIO:0x0000000108699ac0>
irb(main):006> writer = Zstd::StreamWriter.new(io)
=> #<Zstd::StreamWriter:0x000000010834b4b8 @io=#<StringIO:0x0000000108699ac0>, @stream=#<Zstd::StreamingCompress:0x000000010834b3c8>>
irb(main):007> writer.write("abc")
=> 12
irb(main):008> writer.finish
=> 3
irb(main):009> io.rewind
=> 0
irb(main):010> reader = Zstd::StreamReader.new(io)
=> #<Zstd::StreamReader:0x0000000109a30638 @io=#<StringIO:0x0000000108699ac0>, @stream=#<Zstd::StreamingDecompress:0x0000000109a30548>>
irb(main):011> reader.read(10)
=> "a"
irb(main):012> reader.read(10)
=> "bc"
irb(main):013> reader.read(10)
/Users/springmt/zstd-ruby/lib/zstd-ruby/stream_reader.rb:11:in `read': EOF (StandardError)
Add documents for README https://github.com/SpringMT/zstd-ruby/edit/main/README.md#stream-writer-and-reader-wrapper
Thanks for implementing this, but it doesn't seem to support the block form and it doesn't have automatic chunking and automatic flushing like in Zlib::GzipWriter.wrap and Zlib::GzipReader.wrap.
Here are simplified examples using zlib and StringIO. I was hoping that it would work in the same way.
require 'zlib'
require 'stringio'
source_iostr = StringIO.new('This is a test')
compressed_iostr = StringIO.new
Zlib::GzipWriter.wrap(compressed_iostr) do |gz|
IO.copy_stream(source_iostr, gz)
end
compressed_iostr = StringIO.new(compressed_iostr.string)
extracted_iostr = StringIO.new
Zlib::GzipReader.wrap(compressed_iostr) do |gz|
IO.copy_stream(gz, extracted_iostr)
end