Usability Improvements

Open wvaske opened this issue 1 year ago • 0 comments

After the 1.0 submission we found that usability of the benchmark can be greatly improved. This issue will track the 'sub-issues' we intend to address for the 2.0 release.

Please add any items in comments and I will update this top level comment. Feel free to attend the sub-working group meeting (bi-weekly Wednesday morning starting on Nov 20th). Join the MLPerf Storage working group for the invite or message me.

Tasks

Rules Document

[ ] Define filesystem caching rules in detail
[ ] Define system json schema and creation process
[ ] Define allowed time between runs
[ ] Define rules that use local SSD for caching data
[ ] Define rules for hyperconverged and local cache

benchmark[.py | .sh] script

[ ] Unique names for files and directories with structure for benchmark, accelerator, count, run-sequence, run-number
[ ] Better installer that manages dependencies
[ ] Containerization
- [ ] Ease of Deployment of Benchmark (just get it working)
- [ ] Cgroups and resource limits (better cache management)
[ ] Flush Cache before a run
[ ] Validate inputs for –closed runs (eg: don’t allow runs against datasets that are too small)
[ ] Reportgen should run validation against outputs
[ ] Add better system.json creation to automate the system description for consistency
- [ ] Add json schema checker for system documents that submitters create
[ ] Automate execution of multiple runs
[ ] Add support for code changes in closed to supported categories [ data loader, s3 connector, etc]
- [ ] Add patches directory that gets applied before execution
[ ] Add runtime estimation and --what-if or --dry-run flag
[ ] Automate selection of minimum required dataset
[ ] Determine if batch sizes in MLPerf Training are representative of batch sizes for realistically sized datasets
[ ] Split system.json into automatically capturable (clients) and manual (storage)
[ ] Define system.json schema and add schema checker to the tool for reportgen
[ ] Add report-dir csv of results from tests as they are run
[ ] Collect versions of all prerequisite packages for storage and dlio

DLIO Improvements

[ ] Reduce verbosity of logging
[ ] Add callback handler for custom monitoring
- [ ] SPECStorage uses a “PRIME_MON_SCRIPT” environment variable that will execute at different times
- [ ] Checkpoint_bench uses RPC to call execution which can be wrapped externally
[ ] Add support for DIRECTIO
[ ] Add seed for dataset creation so that distribution of sizes is the same for all submitters (file 1 = mean + x bytes, file 2 = mean + y bytes, etc)
[ ] Determine if global barrier for each batch matches industry behavior

Results Presentation

[ ] Better linking and presentation of system diagrams (add working links to system diagrams to supplementals)
[ ] Define presentation and rules for hyperconverged or systems with local cache

Nov 04 '24 17:11 wvaske