flow
flow copied to clipboard
Universal contract for recording/storing/retrieving NFT bigdata into the Flow blockchain
Instructions
The idea of store NFT metadata (as part of bigdata - more than 2Mb) onchain was born after understanding of evolution NFT standard though prism Web3.0, where one of the main idea about “the interaction of the internet with the physical world“. E.g.: on the wall of your house is hanging the art (which was paint by your favorite painter), you know that:
- This art is yours due frame of this art is mounted on your wall;
- Canvas of this art with creation is also yours due the canvas is inserted into the frame of the picture.
In virtual world (which closely integrated with physical world) the understanding the same, e.g. the frame of art is NFT, the canvas of art – bigdata. The logically to store all information in one place – onchain. You don't keep the frame of the picture at home on your wall, and the canvas, for example, in a safe deposit box on bank? If you can do this, than a Bank can be robbed (IPFS or HTTPS resources which store bigdata are depends on other suppliers and can be not reached out or closed. Or any can copy your canvas from your original art without your permission). In total, we have to have possibility to store all NFT and bigdata in one union space (onchain) this is one of correct approach and should be supported.
Issue To Be Solved
To have standart of store NFT bigdata onchain Flow
Possible Solution
Proposal is to create the universal contract for recording/storing/retrieving information about NFT bigdata from blockchain Flow.
Functions:
- Create NFT bigdata for one NFT
a. Create technical account (public key). Note: First function which is create one technical account for store bigdata only for one NFT. b. Check balance for technical account (NFT bigdata+ preview NFT bigdata sizes). Note: Function receives the amount for Flow deposit needed. It can be useful before you are trying to store NFT bigdata to account storage and be sure that it will store successfully. c. Store (id NFT, NFT chunk). Note: Function should be execute in cycle until all bigdata as chunks wasn't sent onchain. d. Unique NFT bigdata (id NFT, NFT bigdata preview). Note: The final function for store NFT bigdata, but in this case store NFT bigdata preview into technical account and function return the unique ID for NFT bigdata .
- Burn NFT bigdata
a. Burn (unique ID for NFT bigdata). Note: function should be execute in cycle until all bigdata as chunks wasn't received from chain and on client side it can be aggregated together. After this, all NFT bigdata chunks and preview are deleted from technical account.
- Transfer bigdata NFT from current owner to another (unique ID for NFT bigdata, current public key, new public key). Note: The key for new owner will attached to technical account; the current owner key is revoked. The key is acknowledge the ownership of NFT bigdata.
- Receive the bigdata NFT preview (unique ID for NFT bigdata). Note: everyone can receive preview bigdata for NFT
Properties:
- The origin NFT bigdata with access only for key to technical account
- The NFT bigdata preview. (Not bigger than 2 Mb, all have access)
- Link to technical account for store
- On technical account all NFT bigdata is storing as 2Mb chunks (Flow transaction limit)
- NFT bigdata is store as binary data (see New binary feature proposal), and as String (Base64 – coding for support current approach to store data onchain)
Cons:
- Additional load on the blockchain in the form of occupied space
- Additional load on NFT owner wallet (forced to deposit funds for NFT bigdata storage)
Pros:
- An NFT owners absolutely sure that they are owners not only NFT, but NFT bigdata also in one space Flow (there is no dependences from other source or any risks to lose NFT bigdata)
- Accepted positively by institutional collectors
- Comply/enhancement of new Web3.0 standard
- Universal interface for storing data on the Flow blockchain for existing and new NFT projects
- Increases the capitalization of Flow (see point 2 of the Cons)
Note: this approach preliminary discussed with @bjartek @MaxStalker .
Nice post to start of the discussion @mrakus7. My take on this is that I think it would be a huge oportunity for the flow blockchain if we could have an StorageProvider api.
Personally i am not sure doing this in cadence will work since it will hit the limits of scripts/transactions pretty quickly. IMHO it needs to be an api that is part of Flow that sits on the outside and is integrated into the different client implementations.
For versus we managed to get this to work for content that was under 5Mb since we chunk upload it and the read that in a single script to get the content. However we soon figured out that reading a lot of this did not scale that well so we had to introduce a thumbnail service to make it scale.
What i really would want is an efficient way to upload a blob and get a reference to it from the client libraries. This might just even be a slim wrapper around IPFS or something. To me this is more of an isse that "I want but do not know how to implement"
I think we should get some feedback here from @AlexHentschel and @ramtinms since they've thought more about the flow storage and execution model than most.
That said, I don't think Flow has a particularly good storage model for large assets.
-
From speaking with Alex, it appears the storage layer for flow uses a register-based model with a small size cap. Each register needs proofs associated with it, and so to store large data the data sets need to be split across multiple registers. This leads to a significant increase in proof generation and storage size and cost.
-
Read-based loads are of particular interest to asset storage and the current Flow storage model isn't well-optimized for this case. Basically, when reading asset data it's more efficient if you can somehow stream all chunks from the filesystem (or memory) without having to jump around a lot. This improves read throughput dramatically for large objects. I would expect asset storage systems to be designed with this in mind, where Flow's registers are not laid out efficiently for large scans of sequential data. Add on to this the need to verify proofs for relatively small chunks of data and I think we'd see a big performance bottleneck.
-
Flow isn't really optimized well for the "data availability" case. The incentive structures are currently built around transaction inclusion, execution, and verification. I think if data storage at scale became a goal it would require incentive alignment work.
There are of course engineering solutions to working around the above examples, for example, big data could be stored externally to the normal state trie and referenced via pointers to that data. This would be similar to how BLOBs are managed in other DBMs. But at this point, it would be a large engineering lift to provide this sort of functionality native to Flow.
Given the existence of purpose-built systems for blob storage and the work around data-availability layers that continues in the space, I would not concentrate on this use case.
I think storing large amounts of data in the execution state requires careful consideration. From my perspective, there are 2 different technical angles to this question:
- Is it conceptually possible? TLDR: yes it is possible.
- While conceptually possible, the current system is not designed with a strong focus on larger data sets.
- We could probably store a few larger blobs, but if there are a larger number of those blobs, we would probably get performance problems our even outages (for example, the verification pipeline is designed for many small data blobs, but not for large data ones).
- How hard is it? TLDR: it will very likely require a substantial engineering investment
-
From an architectural level, the Flow protocol can work with very large data blobs. But I think out implementation cannot. To the best of my knowledge, registers have an upper limit of data they can hold. Larger blobs will we broken up over several registers. For each register, you need a proof to pass verification. So if there is one large blob broken up over 1000 registers, this would already entail 1000 proofs. (@ramtinms is this correct?)
-
We could work around that, by introducing an additional storage mode. For example, if cadence were to differentiate between
- an array, where each element is individually addressable and changeable, vs
- a large data blob that is only ever going to be moved or modified in its entirety
To the best of my understanding, cadence only knows mode (a) (it always breaks up data). But I am not entirely sure about it. So we would need to add mode (b), so Cadence can handle the data depending on their storage semantics.
-
Second is the metering. Obviously a single proof is going to consume much less resources than 1000 proofs. So when we write 2MB to the state, the metering would also need to differentiate between the number of registers these 2MB are going to be stored in. Is it one register (i.e. 1 proof) or 1000 registers (i.e. 1000 proofs).
-
@turbolent when we talk about "large data" to be stored in the execution state, I think it is super important to distinguish between array, where elements are individually addressable (and updatable) vs blob data (where only the blob in its entirety can be updated). From my perspective, "mutability" vs "immutability" is secondary for the storage system. The key differentiating factor is the addressability of individual data pieces, which we cannot allow for large data due to exploding proof sizes. Blob data needs to be stored in a single register, otherwise storing it will not be tractable.
Hi @AlexHentschel
Thank you for involved to this issue.
- Is it conceptually possible? TLDR: yes it is possible.
Yes, based our team perspective it is possible to do.
- How hard is it? TLDR: it will very likely require a substantial engineering investment
There are two ways of enhancement:
-
Short way - we can do it based on current architecture solution, onchain Flow can store big data without of impact to performance.
-
Strategic way - yes you are right, that for this purpose we have to have separate onchain storage for Big data which will be supported by FLow team as well as Flow onchain
Regarding "substantial engineering investment" - for "Short way" is not big investment to have solution which brings closer Flow to Web3.0 (currently no one onchain solution doesn't have it).
Could we think about it again?
Is this related to or a duplicate of https://github.com/onflow/cadence/issues/1402?
@turbolent
Yes, this is related, recommended as new feature as part of being closer to Web3.0 , but not required:
https://github.com/onflow/cadence/issues/1402 - this task will allow to have Blob type vs current String (Base64) approach to store Binary data for all users including this planned contract
Question: @mrakus7 -- Do you have a use case for storing BLOB data that is accessible by contracts? It seems to me that NFT bigdata as blobs would often not need to be accessed by contracts, since computation of and display of the actual information can all happen off chain.
The one of the user cases is designed by @bjartek and it was live already.
From technical perspective it is not needed, but if view though Web3.0 it has the meaning (please see above of the topic example with real art on the wall)
Do we really need to raise FLIP for continue discussion or other next steps can be defined?