specs
specs copied to clipboard
Research/quantify performance envelopes of multiple CDC algorighms
- [ ]
oI
95%
Assemble corpuses of data from various prior performance research initiatives ( both within and outside of PL )- [x]
💯
Enumerate/obtain test datasets - [ ]
90%
Document rationales for the test datasets - [ ]
95%
Publish all of the above as plain HTTP + IPFS pinned download
- [x]
- [ ]
oI
85%
Document prior art, motivation and precise scope and types of sought metrics- [x]
💯
Solicit/assemble feedback from various stakeholders - [x]
💯
Collect/determine relevance of existing academic research into chunking ( 14 distinct papers selected for evaluation ) - [x]
💯
Convert the pre-PL chunk-tester to proper multi-streaming, to dramatically lower the cost of experiments ~( aiming at about 500 megabyte/s stream processing )~ with the correct implementation and hardware about 3.5GiB/s standard ingestion 🎉 - [ ]
80%
Generate few preliminary datapoints to aid understanding the goal/scope - [ ]
90%
In depth study/evaluation/application of findings from above works - [x]
💯
Understand and reuse existing go-ipfs implementations of CDCs ( Rabin + Buzzhash ) in a simpler go-ipfs independent utility, allowing rapid retries of different parameters - [x]
💯
Same as above but pertaining to linking strategies ( trickle-dag etc ), as ignoring the link-layer of streams skews the results disproportionately - [ ]
98%
( subsumes a large portion of points belowv0.1
ETA: DEMO AT TEAM-WEEK ) Fully implement a standalone CLI utility re-implementing/converging withgo-ipfs
on all above algorithms. The distinguishing feature of said tool is the exposure of each chunker/linker as an atomic, composable primitive. The UX is similar to that offfmpeg
whereby an input stream is processed via multiple "filters", with the result being a stream of blocks with a statistic on their counts/sizes plus a valid IPFS CID. Current remaining tasks:- [x]
💯
Profile/optimize baseline stream ingestion, ensure there is no penalty from applying a "null-filter", which allows one to benchmark a particular hardware setup's theoritcal maximum throughput - [x]
💯
Finalize the "stackable chunkers" UI/UX, allowing effortless demonstration of impact of such chunker chains on the - [x]
💯
Adjust statistics compilation/output for the above ( it currently looks like this, ignoring various "filter-levels" ) - [x]
💯
Make final pass on memory allocation profile and fixup obvious low hanging fruit beforev0.1
- [ ]
80%
README / godoc / stuffz
- [x]
- [ ]
80%
Rewrite previously utilized plotly.js-based visualiser to aid with the above point
- [x]
- [ ]
oI
Open document to a short discussion soliciting feedback from workgroups - [ ]
oII
Perform a number of "brute force" tests aiming at reproducible results ( utilizing https://github.com/ipfs/testground ) ~for the purposes of what we are trying to quantifyiptb
will be sufficient~ - [ ]
oII
( half-covered by initial writeup ) Convert raw results into multi-dimensional scatter-plot visualizations ( plotly.js ) - [ ]
oIII
Combine all available results into a "compromise chunking settings" RFC document - [ ]
oIV
Publish the results for discussion and decision of the level of incorporation into IPFS implementations ( default parameters, use of selected algorithm by default, etc )