celestia-node
celestia-node copied to clipboard
Shwap hardening and optimizations tracking issue
Implementation ideas
This issue will collect small improvements that remain after merging features to the shwap. Most have low priority, while others are planned for subsequent PRs and mentioned here to keep them tracked.
ODS file:
- [ ] rename square to quadrant. Planned for Q1Q4 file PR
- [ ] readODS benchmarking. Need to find optimal size for buffer used to read full ODS. Compare buffered read vs reading allocating single array for full ODS. @Wondertan
- [ ] computeAxisHalf: Add more comment on actual algo(selection of opposite axes, usage reconstructSome, etc). Support it with visual diagram of computed axis.
- [ ] fill ods field with data from eds. Planned for improving memory alloc in ODS file PR
- [ ] codec should be removed. Planned after support for reconstructSome is added to rsmt2d.
- [x] header de/serialization funcs of should be fuzzed.
- [ ] cache top level subtree roots to improve proof calculation speed
- [ ] Need test for ods trimmed file to ensure, that data stored was actually trimmed and does not contain padding.
Q1Q4:
- [ ] cache proofs on shrex recompute eds and use them to populate cache file. It will make proofs available in recent cache
- [ ] Add option to AxisHalf method of Accessor to be able to read Q4 on file store. Rework Put() method on eds.Store to use eds.Accessor instead of rsmt2d.EDS
CacheFile:
- [ ] Consider adding sync read / calculate for Accessor in cache file. Could be done by adding cached private fields to Axis half
shares
for full axis andproofs
for calculated proofs. AxisHalf would need concurrency safety field like mutex or sync.Once.
Store:
- [ ] Load/Put method for cache that does not return the Accessor
- [ ] Storage should use eds.Accessor as input for Put().
- [ ] Add support for reducing of Q1Q4file to ODSfile, when they move outside of sampling window.
- [ ] Disallow serving samples outside of sampling window. Can be done by implementing Prunned AccessorGetter inside prunner pkg. Wrapper over AccessorGetter, that will wrap every Accessor with one, that do not allow getting samples from archival heights.
- [ ] Add store statistics. Amount of stored files, sizes, types. Persist statistics. Add List() method to list all files
- [ ] Add write lock or tx log, to detect interrupted writes
- [ ] Add corruption detection and file recovery. Try recover from eds, if unsuccessful use shrex.
- [ ] maintain in-memory missing files index / bloom-filter to fast return.
- [ ] Add tracing
- [ ] Remove stripped locking. OS does synchronization for FS itself, and we don't need to add another layer of synchronization, but we need to double-check for edge cases.
- [ ] Compression file
- [ ] Add handling for partially written data in store. Write process might be interrupted at any time and we need to detect such cases and recover corrupted files.
- [ ] Add Verify method to Store. Verify must ensure file integrity. It should be used in Availability instead of Has pre-check before requesting eds from network.
- [ ] Wait for all writing to finish during Close. Prevent operations after store is closed.
Bitswap:
- [ ] Notify Bitswap about newly stored Shwap containers.
- [ ] Ensure Blockstore GetSize calls followed by Get reuse the same allocated Blocks
- [ ] Protection from requests getting over server limits
- Relevant https://github.com/ipfs/boxo/issues/527 and https://github.com/ipfs/boxo/pull/629#discussion_r1653362485
- [ ] Ensure Bitswap punishes peers for incorrect data
- [ ] #3632(for LNs only)
- [ ] Trace down the reason behind
no unmarshallers found
error even when syncing seems to be working correctly.- Confirmed to be coming from duplicates. Investigate why Bitswap requests more than one peer
- [ ] Debug why FN sync sometimes timeout with Bitswap
- [ ] Don't request padded rows
Shrex:
- [ ] Refactor: make shrex work over general shwap ID
- [ ] Don't stream padding shares
- [ ] update docs+adr
- [ ] refactor shrex metics. Fix rate limiting metric not working
Misc:
- [ ] Migrate off Getter in favor of Accessor
- [ ] Accessor.Sample to be plural allowing partial results control, likely through channels
- [ ] CacheAccessor to deduplicate cached shares across perpendicular axises.
- Not a priority because Сols are only read during reconstruction
- [ ] Use NMT/Rsmt2s native caching once implemented
- Currently we rely on Blockstore and CIDs for that
- Allows us to drop IPLD pkg entirely
- [ ] Accessor test suite should test for empty block retrieval
- [ ] Row should cache decoded shares
- Helpful during Bitswap verification
Availability:
- [ ] Don't sample padding rows
NodeBuilder:
- [x] move bitswap components construction from p2p pkg to share pkg
- [x] make blockstore cache size configurable
Pruning:
- [ ] Convert Q1Q4 to Q1 for archival FN
- [ ] Delete Q1 for pruned FN
- [ ] Disallow samples from outside pruning window from Blockstore(server side) and bitswap getter (client)
- [ ] Rework LN pruning for shwap
DASer:
- [ ] (unrelated to shwap) refactor daser_test.go. Add height param to waitForCatchup method
Spec:
- [ ] Define
NamespaceData
container and id - [ ] Proof-only mode for RowNamespaceData, NamespaceData, maybe Sample?
- [ ] Define protocol IDs for bitswap composition
- [ ] Define Shwap ID Names as constant version strings
- [ ] Add Shrex as composition with protocol IDs derived from shwap ID names
- [ ] Vlad as co-author
- [ ] Update ref impl