gitcoin-grants-data-portal
gitcoin-grants-data-portal copied to clipboard
Pull data directly from chain
Currently, we rely on the Allo Indexer API Data. We should add an option to pull data straight from chains using something like cryo
or subsquids
. This way, we don't need to trust the Allo API data is that's what we want.
Can Gitcoin Data Portal rely on Indexed data?
Can Gitcoin Data Portal rely on Indexed data?
Probably not because Indexed is missing many chains in which GC rounds are running.
We need something like cryo
.
This works!
import cryo
cryo.collect(
"transactions",
blocks=["18.9M"],
rpc="https://eth.merkle.io",
reorg_buffer=1000,
max_concurrent_chunks=15,
inner_request_size=10000,
output_dir="data",
contract=["0x03506eD3f57892C85DB20C36846e9c808aFe9ef4"],
hex=True
)
Don't forget to pip install cryo-python polars
though!
Made a small Colab notebook for people to play around.
From a quick test, it'll take around 52 hour to fully index a that contract, 0x03506eD3f57892C85DB20C36846e9c808aFe9ef4
in Ethereum mainnet
.
- got low-effort 4x speedup while fetching events by raising concurrent_chunks to
100
. - inside collab fetching all (undecoded) logs from Project Registry took
14 seconds
(from deployment 400 days ago till now). - while
TXs
need some thinking, if performance inside CI-runner is comparable,event-based
assets seem feasible now
import cryo
cryo.freeze(
"events",
blocks=["16071515:"],
rpc="https://eth.merkle.io",
reorg_buffer=1000,
max_concurrent_chunks=100,
inner_request_size=10_000,
output_dir="data_fast",
contract=["0x03506eD3f57892C85DB20C36846e9c808aFe9ef4"],
hex=True
)
Woah! I did try with higher max_concurrent_chunks
but didn't get any speedup locally... interesting!
while TXs need some thinking, if performance inside CI-runner is comparable, event-based assets seem feasible now
:rocket:
Just leaving a note that tx data from Covalent is quite neat for analyzing cost side, as it already has dolarized amounts for actual gas cost.
- I think total gas cost of mainnet transactions dealing with grants stack project profiles was $23k for about 2.3k operations.
Unfortunately, the fetch is a bit on the longer side. Figuring out the incremental
part could help save a lot of time and API credits (that we still have aplenty).
- Free API key request limit of 4/second => need to limit parallel runs for assets of that type
- 3 minutes to pull 2.3k events in pages of 100 isn't that impressive
I think total gas cost of mainnet transactions dealing with grants stack project profiles was $23k for about 2.3k operations.
Nice! Would be awesome to publish a report
inside Quarto analyzing the new data and showing the process to derive these numbers.
Unfortunately, the fetch is a bit on the longer side. Figuring out the incremental part could help save a lot of time and API credits (that we still have aplenty). Free API key request limit of 4/second => need to limit parallel runs for assets of that type 3 minutes to pull 2.3k events in pages of 100 isn't that impressive
Understandable. Really need to think harder about #28. Meanwhile, we can always do it slow. GitHub actions errors out after... 6 hours I think. :man_shrugging:
I'm keeping an eye on mesc
and its integration with Cryo. I think there might be a simple approach to get data from multiple chains easily. Probably slower than Covalent, except if we do partitions + incremental!