lotus
lotus copied to clipboard
Storage / Retrieval Deals With Partial Content
Checklist
- [X] This is not a new feature or an enhancement to the Filecoin protocol. If it is, please open an FIP issue.
- [X] This is not brainstorming ideas. If you have an idea you'd like to discuss, please open a new discussion on the lotus forum and select the category as
Ideas
. - [X] I have a specific, actionable, and well motivated feature request to propose.
Lotus component
- [ ] lotus daemon - chain sync
- [ ] lotus miner - mining and block production
- [ ] lotus miner/worker - sealing
- [ ] lotus miner - proving(WindowPoSt)
- [X] lotus miner/market - storage deal
- [X] lotus miner/market - retrieval deal
- [X] lotus miner/market - data transfer
- [ ] lotus client
- [ ] lotus JSON-RPC API
- [ ] lotus message management (mpool)
- [ ] Other
What is the motivation behind this feature request? Is your feature request related to a problem? Please describe.
Let's say I want to store a large existing IPLD dataset larger than a sector on Filecoin. Currently, we face several obstacles:
- Right now, from a storage standpoint, the only way to store anything but a whole DAG is an offline deal
- From a retrieval standpoint, we can retrieve a partial DAG via expressing a selector other than "give me the entire DAG". But there are various problems here for our large dataset:
- We can't do this at the CLI level currently cause we lack a command line syntax for selectors.
- Even if we could, the syntax for selectors is limited ATM -- we lack a "give me the whole DAG except the part below this CID cause I know it's in another piece" selector
- Even if we had more powerful selectors, selectors require the retrieval client to know a-priori what the right selector is to get the part of the DAG contained in a single sector.
Let's consider what we'd like to be possible:
- The person storing should be able to break up their very large DAG in arbitrary ways into a set of partial DAGs
- The person retrieving should be able to just start at the root, make a retrieval, see what they get back, and then plan to make retrievals from there.
We also already have alternate storage clients like Estuary that are failing proposed deals cause they are trying to send partial DAG data to miners.
Describe the solution you'd like
Fortunately, our underlying transport protocol for data transfer, Graphsync, can serve requests where the peer sending the data only has part of the DAG expressed by the requested CID+Selector. The Graphsync responder knows how to communicate to request what it served and what it didn't, and the requestor knows how to process this information and still verify the response.
Currently, the go-data-transfer library currently fails all transfers where the entire request root + IPLD selector is not served.
I propose that we allow data transfers to complete successful for a transfer that have only serves a partial response.
My proposed bubbling up to Lotus is as follows:
- go-data-transfer should emit an event on both sides to notify the calling library of a CID that was not served and was skipped over
- go-data-transfer should have a new final status of
PartiallyCompleted
for when a transfer is done sending/receiving but the entire DAG was not served (plus possibly some additional events that put it in this state) - go-fil-markets storage client will fire a
ClientEventDataTransferComplete
when go-data-transfer ends inPartiallyCompleted
(the same event emitted when data transfer ends inCompleted
) and otherwise be unchanged - go-fil-markets storage provider will fire a
ProviderEventDataTransferCompleted
when go-data-transfer ends inPartiallyCompleted
(the same event emitted when data transfer ends inCompleted
) and otherwise be unchanged. The CommP calculation will be run on the received CAR file for the partial DAG and as long as it matches the Storage Proposal, the deal will continue as planned - go-fil-markets retrieval client will fire
ClientEventPartiallyComplete
when data transfer ends with thePartiallyCompleted
status. This will trigger analogous "Partial" states forDealStatusCheckComplete
andDealStatusFinalizingBlockstore
, which will transition toDealStatusPartiallyComplete
as the retrieval client's final status - go-fil-markets retrieval provider will fire ProviderEventPartiallyComplete when a datatransfer ends with the PartiallyCompleted status. This will move the deal to
DealStatusPartiallyCompleting
and thenDealStatusPartiallyCompleted
when CleanupDeal is finished. - at the Lotus API level, ClientRetrieve is unchanged -- it just returns statuses from retrieval client
- at the CLI level, ClientRetrieve will output all retrieval statuses and a final message indicating that only partial transfer was completed
Describe alternatives you've considered
see above -- while selectors are a path forward potentially they have several limitations and the path to achieving a desirable result through them is long
Additional context
- I am specifically suggesting leaving the LOTUS import process unchanged for now -- we are not trying to solve importing partial DAGs into Lotus at the moment.
- Rather the client that already has a need for this functionality is Estuary, so what's ultimately most import is for Lotus to support this on the miner side, and the retrieval client side