lz4 icon indicating copy to clipboard operation
lz4 copied to clipboard

What is the source of Mark.Twain-Tom.Sawyer* and other testdata?

Open sten0 opened this issue 3 years ago • 2 comments

Mark.Twain-Tom.Sawyer* appears to be from Project Gutenberg, especially given that Jose Menendez is involved. If this is the case, would you please restore the Project Gutenberg header? Alternatively, please document where these files were downloaded from. The absence of this documentation is another barrier to packaging lz4 v4 in Debian (and all of its derivatives), which currently only has v2. Please do this for all testdata. Also, if you're located in the USA it would be nice if you would declare that files in the public domain have been relicenced (or dual-licenced) CC0 or CC0-1.0; both of these are a public domain equivalent that are recognised in jurisdiction where the concept of public domain does not exist.

Unfortunately the implicit BSD-3-clause license for testdata/* isn't valid, especially for those GPL-2-only images noted in the other issue I opened.

Kind regards, Nicholas

P.S. 'hope you're staying cool wherever you are, the heatwave on the east coast is intense!

sten0 avatar May 14 '22 22:05 sten0

I need to investigate as I dont recall. I might need to remove some of the other files (the Linux kernels) as they were provided for bug fixing. Will do so when I get time.

pierrec avatar May 25 '22 07:05 pierrec

Pierre Curto @.***> writes:

I need to investigate as I dont recall. I might need to remove some of the other files (the Linux kernels) as they were provided for bug fixing. Will do so when I get time.

Thanks. I hope it will be possible to do this before July, because Syncthing >=1.19.2 (possibly as early as 1.19) requires lz4 v4, and we're currently blocked by this issue. For what appears to be a Gutenberg with with missing copyright and license headers, it's probably faster to just redownload a fresh copy, and then leave the headers intact.

#178 would need to be fixed in a hurry if the FSF got wind of what appears to be a GPL noncompliance issue... I won't tell, of course, 'just saying it's a risk.

I'll plan to gently ping you in a few weeks ;)

Oh, this seems like another option for CI too:

  1. Download (or bundle) the lz4 reference implementation (probably build the lz4 reference implementation)
  2. Make it generate its test data (probably make it run its tests to establish a control)
  3. Use that test data for your own tests

This would also advance progress towards resolving #151, because it solves prerequisites for testing interoperability with the reference implementation. Eg: Use reference implementation for this lz4's input, and call the fresh copy of the reference implementation for this lz4's output.

sten0 avatar May 25 '22 10:05 sten0