fakedatafs
fakedatafs copied to clipboard
Feature request: Generate data using markov chains
It would be nice to use markov chains or similar to produce data of different patterns / "similarity" for use with benchmarking compression and deduplication.
Candidates include:
- file/directory names
- data blocks / "segments"
- directory depth / structure
Speaking of which, there should be test cases for weird path elements like long names, funny charset encodings, broken charset encodings, etc. I suspect some of the rsync-based tools might have a hard time with files and folders containing special characters.