Feature request: Generate data using markov chains

Open cfcs opened this issue 9 years ago • 1 comments

It would be nice to use markov chains or similar to produce data of different patterns / "similarity" for use with benchmarking compression and deduplication.

Candidates include:

file/directory names
data blocks / "segments"
directory depth / structure

Mar 07 '16 21:03 cfcs

Speaking of which, there should be test cases for weird path elements like long names, funny charset encodings, broken charset encodings, etc. I suspect some of the rsync-based tools might have a hard time with files and folders containing special characters.

Mar 07 '16 21:03 cfcs