c-aff4 icon indicating copy to clipboard operation
c-aff4 copied to clipboard

zlib compression is not in the AFF4 standard

Open jtsylve opened this issue 5 years ago • 6 comments

I noticed in commit e5e148960d4e0155df376b376ac7162b97829961 that you re-added the option to aff4imager to support the "zlib" compression (deflate is supported). @blschatz pointed out to me that this form of compression isn't technically part of the AFF4 standard. I don't think we should allow options to create non-standards compliant images, as it's going to make thing worse for tool vendors who want to read the images.

jtsylve avatar Aug 16 '19 18:08 jtsylve

Seems like an omission in the standard? We have been using Zlib forever and the code to do deflate was completely broken and possibly not ever tested (it was raising an exception every time).

I tried to fix the deflate code but I'm not sure we ever produced any images with deflate. Supporting Zlib is required for reading old images too.

From a technical perspective Zlib is the same as deflate but with an additional adler32 checksum. Checksumming is good because it adds another layer of checking to the format at the chunk level.

Thanks Mike.

On Sat, Aug 17, 2019, 04:53 Joe Sylve [email protected] wrote:

I noticed in commit e5e1489 https://github.com/Velocidex/c-aff4/commit/e5e148960d4e0155df376b376ac7162b97829961 that you re-added the option to aff4imager to support the "zlib" compression (deflate is supported). @blschatz https://github.com/blschatz pointed out to me that this form of compression isn't technically part of the AFF4 standard. I don't think we should allow options to create non-standards compliant images, as it's going to make thing worse for tool vendors who want to read the images.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Velocidex/c-aff4/issues/129?email_source=notifications&email_token=AA5NRISGUD52S2ZDXOQQZPDQE3ZTFA5CNFSM4IML5ROKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HFWYVJA, or mute the thread https://github.com/notifications/unsubscribe-auth/AA5NRITEN6UF2ADFLKUXL5TQE3ZTFANCNFSM4IML5ROA .

scudette avatar Aug 16 '19 20:08 scudette

The original AFF4 paper used Zlib compression, while in Evimetry and the Wirespeed paper we started using deflate (if I recall correctly, because the Adler32 checksum calculation was a bottleneck at the time).

The configurable compression scheme proposal of the Wirespeed paper was adopted when the google/aff4 C and Python implementations were created in 2015. It looks like both of these implementations continued using Zlib. Thankfully, in both implementations we are using unique identifiers for each (zlib = "https://www.ietf.org/rfc/rfc1950.txt", deflate = "https://tools.ietf.org/html/rfc1951").

At this stage, given that both schemes have been in-use for some time, I'd view it as an omission from the standard. I'm happy to commit to Evimetry supporting both, and think that the C and Python implementations should be modified to allow both schemes (we are happy to fix the latter, if someone wants to put their hand up to fix the former).

-bradley

blschatz avatar Aug 18 '19 23:08 blschatz

Given that both types of images exist in the wild, practically readers need to be able to read both, so I agree that it's probably best just to add the deflate option to the standard, so that it becomes a requirement for support.

On another note, now that the tool supports LZ4, I think we should revisit the default compression option. LZ4 seems to have similar compression ratios to deflate/zlib and is faster for both reading and writing in my testing

jtsylve avatar Aug 19 '19 16:08 jtsylve

Re Lz4 - we have seen some chatter suggesting similar, but haven't been able to replicate it. In all our testing snappy is still winning.

I'd like to get to the bottom of this.

blschatz avatar Aug 20 '19 06:08 blschatz

Does it really matter? On my 24 core server zlib is maxing out the CPU anyway and imaging is generally IO bound on any compression scheme (given enough threads).  IMHO it is fine to add knobs to tune this but we need to make sure the default is robust and works ok - it does not need to be the fastest.

Thanks

Mike

On 8/20/19 4:25 PM, Bradley L Schatz wrote:

Re Lz4 - we have seen some chatter suggesting similar, but haven't been able to replicate it. In all our testing snappy is still winning.

I'd like to get to the bottom of this.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Velocidex/c-aff4/issues/129?email_source=notifications&email_token=AA5NRIVQQ3FQ4TJZKCPODKDQFOE7DA5CNFSM4IML5ROKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4VGOPI#issuecomment-522872637, or mute the thread https://github.com/notifications/unsubscribe-auth/AA5NRIQ3APFEXK6DJCPFZHDQFOE7DANCNFSM4IML5ROA.

--

Mike Cohen Digital Paleontologist, Velocidex Enterprises

M ‭+61 470 238 491‬ <tel:‭+61 470 238 491‬> E [email protected] mailto:[email protected]

scudette avatar Aug 20 '19 07:08 scudette

I don't think we need to be prescriptive about which algorithm to use. The spec is silent on this.

blschatz avatar Aug 20 '19 11:08 blschatz