graph-node First working implementation of Solana

Feb 01 '22 12:02 colindickson

Next steps around this PR and Solana integration in general:

Rebased this PR on top of master
Update protobuf definition using latest one (https://github.com/streamingfast/proto-solana/blob/develop/sf/solana/codec/v1/codec.proto)
Remove account_changes up to the graph-ts bindings and handler because we will get rid of it for version 1
Correctly update AscIndexId generated (Use new defined pattern around segmented ids per chain, see https://github.com/graphprotocol/graph-node/pull/3446)
Update block_pointer_from_number (like https://github.com/graphprotocol/graph-node/blob/d0d0cc1da0ca95398536cacf4bd47bf84031a2f7/chain/near/src/chain.rs#L200)
Start block ingestor like other chain (https://github.com/graphprotocol/graph-node/blob/60cd12ce7dd8ef796f8c8e93002cc7624fd6346c/node/src/main.rs#L348)
- Implement header only transform and support in graph-node (Firehose work will be required to add a specialized transform that returns a Block with transactions, transaction_count, has_split_account_changes and account_changes_file_ref fields removed).
Add TriggerFilter::InstructionFilter where user's can match specific instructions based on criterion
- Filter based on account key at specific position in the instruction' account inputs (- accountInput { atIndex: 0, onKey: 3AWPyBF4CcoFY3zsxAbc3PRTA8JK227FSsfkYVYVFuxJ})
- Filter based on first byte of instruction's data, which enables matching by instruction name (add Anchor support here to accepts a name from the IDL definition and auto-map it to instruction's ID?)
- Review filters to be sure they are correct for initial Solana support
Improve triggers_in_block implementation, instructions should be filtered upfront to triggers, remove as much clone() as possible, structures are big and everywhere cloning we can removed, the better graph-node will love us
Implement Firehose Filtering support in graph-node (StreamingFast will be doing the adjustment to Firehose to support that, based on the filters reviewed and approved)
Instruction tree serialization de-risking (see Risk: Instructions Tree Serialization)
Create initial Solana subgraph which would collect SPL token holders (I think it's achievable)
Deploy to staging

Risk: Instructions Tree Serialization

We think having contextual information about other instructions in the tree of instructions rooted at the matched instruction is a requirement. Indeed, we expect that the transaction matched by manifest will actually perform other important child instructions that dapps developers will require access to. For example, performing an NFT mint on Solana generates a bunch of child instructions that is important to have access to, like the SPL token created.

We think that serializing the whole tree to the WASM handler will give a good DX to dapp developer. However, it's a risk because the instruction tree could be quite big in term of AssemblyScript serialized bytes and the time it could take to do it.

We should explore this question as soon as possible to understand the consequences of fully serializing the full instructions tree always. Other possibilities:

Serialize the tree only on demand (an intrinsics would perform the work only when the subgraph calls it)
Serialize specific child instruction of the tree only on demand (an intrinsics would perform the work only when the subgraph calls it), raises questions about how we "index" them and user would "find" them
De-contextualize and send specific instructions on different handler, so we would keep the one instruction == one handler paradigm, raises questions about how this would look like in the manifest

Apr 27 '22 15:04 maoueh

Thanks @maoueh (& @jubeless) for this update.

add Anchor support here to accepts a name from the IDL definition

How complex is this proposal, and what would the Developer Experience look like (would this require changes to upload the IDL as part of the manifest?)

Create initial Solana subgraph

Is this initial subgraph definition quite well understood & is any further research required here?

Risk: Instructions Tree Serialization

Is the primary concern here feasibility (i.e. will it be too big in terms of AssemblyScript serialized bytes) or performance (regardless, the serialisation will be too slow)?

Seems like there are still a few outstanding pieces on both the Firehose & Graph Node side here?

Apr 27 '22 21:04 azf20

How complex is this proposal, and what would the Developer Experience look like (would this require changes to upload the IDL as part of the manifest?)

It should have expanded further saying indeed this is a risk factor if we decide to have right now.

So good question, we checked the JSON output format, seems rather "easy" to work it. Now the question is how easy "reading" the data is. For the actual account input, I think it will be quite easy actually to do something.

After that, it's a matter to have low level decoding of each Anchor base types. We are not sure which encoding Anchor uses, it's own, Borsh or default Rust C encoding that was used before, would need to check that. I don't think it's utterly complex to have a working version, need the man power to dedicate at least a week to that specific task.

I envision generate the struct types from the IDL and also the decoding code directly, would be all in AssemblyScript.

Is this initial subgraph definition quite well understood & is any further research required here?

Would be good to sync on that to be sure we are on the same page, Julien told me you discussed with him maybe SPL Token Holders, if it's the case, we already thought about how to have the right "matching" to get the correct one. I will try to sit with Julien and together we will write the manifest and "pseudo-code" for this subgraph.

Is the primary concern here feasibility (i.e. will it be too big in terms of AssemblyScript serialized bytes) or performance (regardless, the serialisation will be too slow)?

Both in fact, I think it's really something that needs to be checked first, how long does it take mainly and also how many bytes it consumes, will give indication of how many subgraph we would be able to run on N GB of RAM (worst case/average case something around that).

Seems like there are still a few outstanding pieces on both the Firehose & Graph Node side here?

Yep there is work on both side, right now we are spinning up the whole Firehose infrastructure and ensuring it works with how big the blocks are (even with account data removed).

For filtering, I don't think it will take much time, we have everything to easily create filter(s) and indexes, we should need to decide about the filters we need to support, so the review of the filters available in the manifest is a prior task to have them in Firehose.

Apr 27 '22 22:04 maoueh

Hi, while looking for a way to use subgraph with Solana, I have pumped to this PR. How is the PR going on? I saw there is no update since Feb, and the last comment is at April. Will this PR still work or stale?

I also saw from chain folder, there is a source substreams is merged? Will it works as source for solana as I thought substreams already support for solana.

Aug 29 '22 08:08 halink0803

So this branch is stalled I'm not 100% sure how it will behave. This was before substreams was a thing.

With substreams now and how it's currently integrated within graph-node, any chain that was substreams will be able to integrate directly. Let's continue in Discord with you have further questions.

Aug 29 '22 18:08 maoueh

@maoueh thank you. I just pump into this and wonder about the status of it. If substreams is the alternative, I think this PR could be closed.

Aug 30 '22 06:08 halink0803

Since the focus is currently on substreams support for Solana, I'm closing this for now.

Nov 07 '22 10:11 leoyvens

graph-node graph-node copied to clipboard

First working implementation of Solana

Risk: Instructions Tree Serialization

graph-node
graph-node copied to clipboard