fluentd icon indicating copy to clipboard operation
fluentd copied to clipboard

compressable:Zstd comp support

Open Athishpranav2003 opened this issue 1 year ago • 5 comments
trafficstars

Which issue(s) this PR fixes: Fixes #4162

What this PR does / why we need it: Adds new compression method support to handle messages Docs Changes: TODO Release Note: N/A

Athishpranav2003 avatar Oct 03 '24 18:10 Athishpranav2003

@daipom @ashie need some comments on this since its a very new compression method. I tried to make sure the existing support is not broken and added this feature additionally. Need to perform some more work but wanted to get some comments from you guys

Athishpranav2003 avatar Oct 03 '24 18:10 Athishpranav2003

Thanks! I will see this this weekend!

daipom avatar Oct 04 '24 01:10 daipom

@daipom did u get chance to look at this?

Athishpranav2003 avatar Oct 16 '24 15:10 Athishpranav2003

Sorry, I haven't made time for this. :cry: I'll make time for it this week.

daipom avatar Oct 17 '24 07:10 daipom

Its fine i guess, Even i was very busy in last 2 weeks :+1:

Athishpranav2003 avatar Oct 17 '24 11:10 Athishpranav2003

@Athishpranav2003 I'm sorry for my late response. :cry:

I have confirmed the entire direction! It's great! Thanks for starting this enhancement!

Sorry I haven't made time to see the detailed implementation, such as Compressable module, but the overall design of Fluentd would be essential for us now. So, now, I comment on the overall direction of further modification.

It looks basically good as in_forward support! All that is left is to support Output/Buffer/Chunk logic for output plugins. (Currently, these logic assumes gzip only) To do so, it would be a good idea to support out_forward first.

The core logic would be the following.

  • EventStream#to_compressed_msgpack_stream
    • https://github.com/fluent/fluentd/blob/78a7972bfe6b421f08472701f04f00515ed24bee/lib/fluent/event.rb#L65-L68
  • Output#generate_format_proc
    • https://github.com/fluent/fluentd/blob/78a7972bfe6b421f08472701f04f00515ed24bee/lib/fluent/plugin/output.rb#L1016-L1027
  • Chunk::Decompressable
    • https://github.com/fluent/fluentd/blob/78a7972bfe6b421f08472701f04f00515ed24bee/lib/fluent/plugin/buffer/chunk.rb#L184-L235

It is complicated, but the compression of Buffer and Chunk and the flush process of output plugins are closely related. The following existing implementations may be helpful for how output plugins behaves:

  • https://github.com/fluent/fluentd/blob/78a7972bfe6b421f08472701f04f00515ed24bee/lib/fluent/plugin/out_forward.rb#L253-L259
  • https://github.com/fluent/fluentd/blob/78a7972bfe6b421f08472701f04f00515ed24bee/lib/fluent/plugin/out_forward.rb#L664-L668
  • https://github.com/fluent/fluentd/blob/78a7972bfe6b421f08472701f04f00515ed24bee/lib/fluent/plugin/out_file.rb#L214-L225
  • https://github.com/fluent/fluentd/blob/78a7972bfe6b421f08472701f04f00515ed24bee/lib/fluent/plugin/out_file.rb#L245-L268

daipom avatar Oct 21 '24 10:10 daipom

@daipom thanks for the review.

Yah, this will kick a lot more changes in the overall events pipeline. For now i have just added the compression support as a module in Compressible. And used this in in-forward. I will try to look at Buffer as well in meantime(might need more time since i got busy with a project). I guess maybe splitting it into multiple PRs will be easy since i can finish small chunks one by one(seperate functionality) and easy to review

Athishpranav2003 avatar Oct 21 '24 11:10 Athishpranav2003

@daipom can you please check the change in the Compression module as well. Once if we merge this we can proceed with the other development around this

Athishpranav2003 avatar Oct 23 '24 11:10 Athishpranav2003

@Athishpranav2003

can you please check the change in the Compression module as well

Sure! I will review the Compressable module!

Once if we merge this we can proceed with the other development around this

To judge if we can merge this only with the support of the module and in_forward, we need to reach an agreement on updating Forward Protocol Specification.

It would be necessary to update CompressedPackedForward Mode. By allowing zstd as the value for the key compressed, it would be possible to add Zstd support while keeping compatibility, but I'm not sure yet.

daipom avatar Oct 25 '24 07:10 daipom

Sure @daipom

If it's needed then I can split the PR into 2 where this will only focus on zstd and other one for in-forward

Athishpranav2003 avatar Oct 25 '24 08:10 Athishpranav2003

@daipom how to proceed in this PR?

Athishpranav2003 avatar Oct 29 '24 15:10 Athishpranav2003

Sorry, please give me a few more days 😢

daipom avatar Oct 30 '24 09:10 daipom

bundle exec rake test TEST=test/plugin/test_compressable.rb                      2856ms  Sun 03 Nov 2024 11:29:54 AM IST
/usr/bin/ruby -w -I"lib:test" -Eascii-8bit:ascii-8bit /usr/share/gems/gems/rake-13.2.1/lib/rake/rake_test_loader.rb "test/plugin/test_compressable.rb" 
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/rubygems_ext.rb:250: warning: method redefined; discarding old encode_with
/usr/local/share/ruby/site_ruby/rubygems/dependency.rb:341: warning: previous definition of encode_with was here
/usr/share/ruby/bundled_gems.rb:75:in `require': libruby.so.3.2: cannot open shared object file: No such file or directory - /usr/lib64/gems/ruby/yajl-ruby-1.4.3/yajl/yajl.so (LoadError)
	from /usr/share/ruby/bundled_gems.rb:75:in `block (2 levels) in replace_require'
	from /usr/share/gems/gems/yajl-ruby-1.4.3/lib/yajl.rb:1:in `<top (required)>'
	from /usr/share/ruby/bundled_gems.rb:75:in `require'
	from /usr/share/ruby/bundled_gems.rb:75:in `block (2 levels) in replace_require'
	from /home/aggressive_racer1/projects/fluentd/lib/fluent/config/literal_parser.rb:20:in `<top (required)>'
	from /usr/share/ruby/bundled_gems.rb:75:in `require'
	from /usr/share/ruby/bundled_gems.rb:75:in `block (2 levels) in replace_require'
	from /home/aggressive_racer1/projects/fluentd/lib/fluent/config/element.rb:18:in `<top (required)>'
	from /usr/share/ruby/bundled_gems.rb:75:in `require'
	from /usr/share/ruby/bundled_gems.rb:75:in `block (2 levels) in replace_require'
	from /home/aggressive_racer1/projects/fluentd/test/helper.rb:42:in `<top (required)>'
	from /home/aggressive_racer1/projects/fluentd/test/plugin/test_compressable.rb:1:in `require_relative'
	from /home/aggressive_racer1/projects/fluentd/test/plugin/test_compressable.rb:1:in `<top (required)>'
	from /usr/share/ruby/bundled_gems.rb:75:in `require'
	from /usr/share/ruby/bundled_gems.rb:75:in `block (2 levels) in replace_require'
	from /usr/share/gems/gems/rake-13.2.1/lib/rake/rake_test_loader.rb:21:in `block in <main>'
	from /usr/share/gems/gems/rake-13.2.1/lib/rake/rake_test_loader.rb:6:in `select'
	from /usr/share/gems/gems/rake-13.2.1/lib/rake/rake_test_loader.rb:6:in `<main>'
rake aborted!
Command failed with status (1): [ruby -w -I"lib:test" -Eascii-8bit:ascii-8bit /usr/share/gems/gems/rake-13.2.1/lib/rake/rake_test_loader.rb "test/plugin/test_compressable.rb" ]
/usr/share/gems/gems/rake-13.2.1/exe/rake:27:in `<top (required)>'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/cli/exec.rb:58:in `load'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/cli/exec.rb:58:in `kernel_load'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/cli/exec.rb:23:in `run'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/cli.rb:455:in `exec'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/vendor/thor/lib/thor/command.rb:28:in `run'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/vendor/thor/lib/thor.rb:527:in `dispatch'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/cli.rb:35:in `dispatch'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/vendor/thor/lib/thor/base.rb:584:in `start'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/cli.rb:29:in `start'
/usr/share/gems/gems/bundler-2.5.16/exe/bundle:28:in `block in <top (required)>'
/usr/share/gems/gems/bundler-2.5.16/lib/bundler/friendly_errors.rb:117:in `with_friendly_errors'
/usr/share/gems/gems/bundler-2.5.16/exe/bundle:20:in `<top (required)>'
/usr/bin/bundle:25:in `load'
/usr/bin/bundle:25:in `<main>'
Tasks: TOP => test => base_test
(See full trace by running task with --trace)

@daipom i addressed your comments but couldnt test it locally due to some issue. yajl-ruby is already installed in my system but some other issue

Athishpranav2003 avatar Nov 03 '24 05:11 Athishpranav2003

I guess there is a small careless Problem is local testing is not working at all. Could help me with that later

I will check after 18th Got packed with exams now :(

Athishpranav2003 avatar Nov 05 '24 02:11 Athishpranav2003

Hmm, these changes would be necessary, but looks like the error in your environment is caused by another reason.

https://github.com/fluent/fluentd/pull/4657#issuecomment-2453298305

/usr/share/ruby/bundled_gems.rb:75:in `require': libruby.so.3.2: cannot open shared object file: No such file or directory - /usr/lib64/gems/ruby/yajl-ruby-1.4.3/yajl/yajl.so (LoadError)

daipom avatar Nov 05 '24 02:11 daipom

I will check after 18th Got packed with exams now :(

OK! I'm sorry my response was so slow, and it took so long. Good luck with the exams!

daipom avatar Nov 05 '24 02:11 daipom

@daipom the gem issue got resolved after update,prolly some mismatched versions,

All tests are passing now

Athishpranav2003 avatar Nov 18 '24 08:11 Athishpranav2003

@daipom some doubts here

in chuck open method for decompress we dont have any type identification. how to identify the chunk type while open/read method is called https://github.com/fluent/fluentd/blob/78a7972bfe6b421f08472701f04f00515ed24bee/lib/fluent/plugin/buffer/chunk.rb#L201-L226

Athishpranav2003 avatar Nov 18 '24 12:11 Athishpranav2003

@daipom some doubts here

in chuck open method for decompress we dont have any type identification. how to identify the chunk type while open/read method is called

https://github.com/fluent/fluentd/blob/78a7972bfe6b421f08472701f04f00515ed24bee/lib/fluent/plugin/buffer/chunk.rb#L201-L226

I see! The current implementation is based on the following assumption.

  • It is a gzip format chunk = It extends Decompressable module

https://github.com/fluent/fluentd/blob/10f0f112eb112219bbe959a3e102102b0d429ce7/lib/fluent/plugin/buffer/chunk.rb#L51-L64

If we add a compression format, we need to change something here. I discussed the direction with @Watson1978 and @kenhys. We believe the following direction seems to be a good one.

  • Create Decompressable module for each compression format
    • GzipDecompressable (Rename the current Decompressable)
    • ZstdDecompressable
  • Switch modules to extend according to the compression format of the chunk

The current implementation of Decompressable looks a little weird to me. I don't see the point of specifying the format in argument compressed I don't want to change the signature for compatibility, but I believe a parameter like raw would be more correct...

Anyway, we only need to consider the possibility that gzip is specified for compressed for GzipDecompressable, and zstd for ZstdDecompressable. (If a different compression format is specified to compressed, the chunk should ignore it and decompress itself.)

daipom avatar Nov 20 '24 02:11 daipom

Oh ok Let me again go through and check it

But what is that parameter compress referring to?

Athishpranav2003 avatar Nov 20 '24 04:11 Athishpranav2003

@daipom seems like buffer is working :) Will go ahead and do other stuff

Athishpranav2003 avatar Nov 21 '24 03:11 Athishpranav2003

@daipom Can you check this out_file once. Also how to go about UTs for it. Which all UTs should we write for zstd.

Athishpranav2003 avatar Nov 21 '24 10:11 Athishpranav2003

@daipom can you check the PR now?

Athishpranav2003 avatar Nov 22 '24 02:11 Athishpranav2003

@daipom added UTs as well. PR seems to be complete now let me know if i missed anything

Athishpranav2003 avatar Nov 24 '24 14:11 Athishpranav2003

Sorry I have been busy this week :cry: Thanks! I will check it out!

daipom avatar Nov 27 '24 03:11 daipom

hey @daipom sorry to ping again. would be good if i can wrap this up soon

Athishpranav2003 avatar Dec 18 '24 14:12 Athishpranav2003

@Athishpranav2003

hey @daipom sorry to ping again. would be good if i can wrap this up soon

Sorry for my late response :cry: I was busy releasing Fluentd v1.18 and fluent-package v5.2, sorry. Now, those are done. I will see this in a few days.

daipom avatar Dec 20 '24 06:12 daipom

Yah, I was seeing ur contributions for a while. Its fine for me

Athishpranav2003 avatar Dec 20 '24 08:12 Athishpranav2003

@Athishpranav2003 Sorry for my late response. Overall, it looks great! Thanks so much!

About out_file and out_forward, it would be better that we do not support setting a different compression format from the buffer. There would be no particular benefit in allowing it. Those plugins should raise ConfigError if a different compression format is set (except :text).

For example, for out_forward, we should check the format as follows.

        if @buffer.compress == :text
          @buffer.compress = @compress unless @compress == :text
        else
          if @compress == :text
            log.info "buffer is compressed.  If you also want to save the bandwidth of a network, Add `compress` configuration in <match>"
          elsif @compress != @buffer.compress
            raise Fluent::ConfigError, "You cannot specify different compression formats for Buffer (Buffer: #{@buffer.compress}, Self: #{@compress})"
          end
        end

As for implementation, there appears to be no other problems. (There are some points where it would be nice to make some things simpler, but not essential.)

If you have any other concerns, please let me know.

We need to update the protocol before we merge this. I will consider how to proceed it.

To judge if we can merge this only with the support of the module and in_forward, we need to reach an agreement on updating Forward Protocol Specification.

It would be necessary to update CompressedPackedForward Mode. By allowing zstd as the value for the key compressed, it would be possible to add Zstd support while keeping compatibility, but I'm not sure yet.

daipom avatar Dec 26 '24 07:12 daipom

@daipom i have addressed ur comments. Feel free to point out the nits as well.

Athishpranav2003 avatar Dec 26 '24 11:12 Athishpranav2003