ref-fvm icon indicating copy to clipboard operation
ref-fvm copied to clipboard

ipld block syscalls naming

Open raulk opened this issue 3 years ago • 7 comments

Context

The syscalls bound to the invocation container are:

  • ipld::open
  • ipld::create
  • ipld::read
  • ipld::stat
  • ipld::cid

The syscalls as named in the kernel are the following, under the BlockOps trait:

  • block_open
  • block_create
  • block_read
  • block_stat
  • block_link

There are two problems here.

Problem 1: Naming/categorization

We should name these equally on both sides. Two proposals:

  1. Name the system ipld and namespace block operations under the block_ prefix, i.e.: ipld::block_open.
  2. Consider that IPLD is a foundational component of the stack, and as such it permeates everything. Having a dedicated ipld subsystem sounds odd from those lens. Instead, drop ipld:: and name this subsystem block::, i.e. block::open. Block operations use IPLD under the hood, like everything else.

Problem 2: ipld::cid vs. block_link

Both these names are confusing.

  • cid implies that only a CID is being computed, when in reality the block is also being added to the reachable set.
  • link in the context of blocks suggests that a block is being linked to another block, which is not the case.

Proposals and rationale:

  • block::promote or ipld::block_promote: promotes a block to the reachable set and returns its CID.
  • block::export or ipld::block_export: exports a block outside the actor context and returns its CID.
  • block::publish or ipld::block_publish: publishes a block outside the actor context and returns its CID.

raulk avatar May 02 '22 12:05 raulk

Problem 1.1:

We have the block prefix in the Kernel because, e.g., open, by itself, is ambiguous. This mirrors MessageOps (msg_) and NetworkOps (network_). Honestly, I'd go the other way and merge all of the kernel traits into a single trait.

Problem 1.2:

I wouldn't just call it block::, as:

  1. At the moment, this is the only IPLD API exposed to actors.
  2. Block is ambiguous (usually refers to "blocks" as in blockchain).

We could rename create to write_block and read to read_block as we will likely introduce non-block-based versions (e.g., an ipld::build and, maybe, an ipld::decode). But I'm not convinced it's worth it.

Problem 2

For some context:

  1. I called this cid first (because it computes the cid).
  2. Then I called it link, to mirror the unix "link" syscall and to make it clear that it's an action. But I never renamed the syscall, just the kernel function name.

link was supposed to mirror the "link" syscall. You're giving a "name" to a block. It's not quite the same thing, but it fits with the filesystem theme (open, read, stat, link) where create is the exception.

I'm not totally happy with link, but I much prefer it to promote, export, and publish. The user's mental model should be "I've created a block, now I'm 'linking' it into my state, so I call 'link' and put that CID into my state".

Stebalien avatar May 02 '22 20:05 Stebalien

I'd go the other way and merge all of the kernel traits into a single trait.

This is really just cosmetics. I much prefer the trait-per-family approach as it provides organization and makes code easier to navigate. It also enables us to mock/stub operation families. If anything, I'd vote to make them even more granular going forward (e.g. IpldBlockOps, IpldDagOps, etc.)

However, traits by themselves don't provide function namespacing, so we should include the family as a prefix in the function names. For the IPLD ops, I'd clean things up in this manner:

  • Syscall names: ipld::block_open, ipld::block_write...
  • Kernel side: for now I'd go with a single trait for all IPLD ops:
trait IpldOps {
    fn ipld_block_open();
    fn ipld_block_write();
    ...
}

Syscall nomenclature

It would be good to standardise on a nomenclature on both the invocation container and the kernel. On the container side, I'd lean towards <namespace>::[group_]<name>, where group is optional and refers to a subfamily of operations.

## ipld namespace, naturally lends itself to grouping
ipld::block_write
ipld::block_stat
ipld::dag_build
ipld::dag_destroy
ipld::selector_prepare
ipld::selector_apply
ipld::selector_delete

## crypto namespace
crypto::seal_verify
crypto::seal_verify_batch
crypto::seal_verify_aggregate
crypto::hash_compute
...

ipld::link

The user's mental model should be "I've created a block, now I'm 'linking' it into my state, so I call 'link' and put that CID into my state".

I seriously think that link is a confusing choice. And I don't think this mental model is correct. This op in itself does not link a block to my state. Referencing this block from another that eventually connects to the root is what links the block to my state. To put it simply, I could call link on a block, get a CID back and never use it. When the buffered blockstore commits the state to disk, it will discard my block. Therefore calling link never truly linked my block.

Apart from that, the term link is ambiguous when DAGs are involved. It can easily be interpreted as "this operation will create a link from one block to other." (which is actually more functionally equivalent to the Linux link syscall...)

In my head, this operation today just marks a block as "elegible" for writing (because now I have a CID I can use elsewhere). It kinda feels like Git's staging area. Maybe block_stage is another option?

raulk avatar May 03 '22 10:05 raulk

It also enables us to mock/stub operation families

Not unless we have some form of meta "composed kernel" type that lets us layer a mock over a kernel. But I guess we could have that. And I agree that the grouping is nice.

Syscall names: ipld::block_open, ipld::block_write...

Yeah, SGTM. That makes it clear that this is all block-level.

Stebalien avatar May 03 '22 19:05 Stebalien

I seriously think that link is a confusing choice. And I don't think this mental model is correct. This op in itself does not link a block to my state. Referencing this block from another that eventually connects to the root is what links the block to my state. To put it simply, I could call link on a block, get a CID back and never use it. When the buffered blockstore commits the state to disk, it will discard my block. Therefore calling link never truly linked my block.

It's not great, but, well, this is the operation that creates a "link" to the block. All the other proposals make it sound like it does something super special and complicate matters. Yes, this is where we treat the block as "reachable" as well, but I really don't want users to think about that. They should focus on the idea of creating IPLD blocks and linking them together, not "promoting", storing, exporting, staging, etc.

Stebalien avatar May 03 '22 20:05 Stebalien

My main gripe is still ambiguity. When I think of IPLD I naturally think of DAGs, and in that context link means something entirely different. So if it's confusing for me as an FVM core engineer, I'm suspect it'll be confusing for users too.

How about ipld::block_commit()?

raulk avatar May 04 '22 16:05 raulk

ipld::block_save()?

Kubuxu avatar May 06 '22 17:05 Kubuxu

ipld::block_save() works for me.

raulk avatar May 12 '22 12:05 raulk