devgrants
devgrants copied to clipboard
Open Grant Proposal: Proof of Storage on Large Dynamic Datasets
Open Grant Proposal: Proof of Storage on Large Dynamic Datasets
Name of Project: Proof of Storage on Large Dynamic Datasets
Proposal Category: research
Proposer: qizhou
Do you agree to open source all work you do on behalf of this RFP and dual-license under MIT, APACHE2, or GPL licenses?: Yes
Project Description
A bedrock for future Web3 is decentralized storage (dStorage), where users can store a large amount of data in the network without worrying about that the data is withheld or even discarded by a centralized organization. The essential part of dStorage is proof of storage – the network can prove that the data uploaded by the users are stored by data providers in the network. A couple of solutions such as FILECOIN/ARWEAVE are developed to solve the proof of storage problem and work quite well for static files. However, if the data from the users can be frequently modified or deleted, i.e., dynamic, we currently do not have an ideal solution.
In this proposal, we focus on proof of storage on large dynamic datasets, where
- A dataset is a list of binary large objects (BLOB) that are uploaded and owned by multiple users.
- Dynamic means that the users are able to perform create/read/update/delete (CRUD) operations on the dataset. Note that most of the existing dStorage solutions only support static files (or BLOBs) with limited operations such as create/read.
- Large means that a dataset may be well hosted by a single data node (e.g., 4TB storage capacity), but the size of all datasets can be very large – ranging from 100+TB or even ~PB or more, which far exceeds a single node capacity.
The solution we are working on is to relax the exact replication of Filecoin to approximate replication. The approximate replication stems from the idea of proof of random access - we estimate the number of replications of a dataset (e.g., 4TB) with the IO rate (e.g., 4K random read IOPS) so that given different storage devices, we could estimate the range value of the replications. The benefit of approximate replication is that the cost of updating or deleting a sub-data in the dataset can be much cheaper and more efficient.
Value
In the future of Web3, we are expecting a great demand for decentralized applications that perform create/read/update/delete (CRUD) operations on a large number of datasets such as dynamic NFT, decentralized social networks, or file sharing systems. However, the current decentralized storage systems (e.g., FILECOIN) are mainly designed for static files and lack native support for update/delete operations.
The research proposal aims to address proof of storage on large dynamic datasets. With the technology, we can enable a decentralized KV store where the value can be large - from kilobytes to hundreds of kilobytes, and the total value size can be Petabytes, enabling new decentralized applications!
Proof of storage on large dynamic datasets is a new area in decentralized storage. A lot of new attacks may emerge and need careful study and prevention. We are inviting FILECOIN researchers to co-work in this area in exploring new ideas, attacking the system, and hopefully coming up with a joint final result.
Deliverables
- Papers and articles to describe the proposed method
- Prototype code in Python to demonstrate the proposed method
- Golang code with near-production quality for proof generation and verification
Development Roadmap
Q4 2022
- Problem formulation
- Initial proposal
- Attack vector exploration
Q1 2023
- Prototype python code for both proof and verification
- Smart contract code for verification
Q2 2023
- Golang code/libraries for proof and verification
- Concluding Papers
Total Budget Requested
$ 60,000
Maintenance and Upgrade Plans
The research does not require maintenance. For the upgrade, we may explore the efficient verification method via ZKP.
Team
Team Members
qizhou
Team Member LinkedIn Profiles
https://www.linkedin.com/in/qi-zhou-9a668715/
Team Website
https://ethstorage.io
Relevant Experience
Qi Zhou
- Former software engineer at Facebook, EMC/Dell, Google.
- Core developers on proprietary flash translation layer (including RAID) of Enterprise storage solution
- Experienced in the centralized distributed file systems such as HDFS
Team code repositories
https://github.com/ethstorage
Additional Information
Contact: [email protected]