David Gasquez
David Gasquez
The _ideal_ approach I can think of would be to rely on Dagster partitions and sensors. 1. Read the data from IPFS (or github actions cache!) 2. Run Dagster sensors...
Thinking about [relying on external assets](https://docs.dagster.io/concepts/assets/external-assets). Make the previous run the external assets and compute the diff using sensors?
We could also attach to the previous database and use it as the current state. Run sensors and then the remaining partitions.
Love it! Let's start with this diagram and put it under the `README.md`. Can you share the Excalidraw URL? You can click `share` and it'll create a Read Only link...
@DistributedDoge mentioned to publish a notebook that loops over `tables` or whatever export dir is, fetches schema from `.parquet`files and surfaces that.
> The code has landed, catalog will be built next time you update github pages by doing make publish. :facepalm: I thought the website was published with each push! :shrug:...
Working! :tada: https://davidgasquez.github.io/gitcoin-grants-data-portal/catalog.html
Sharing it here so I remember in the future. Would be awesome to aim for something like this: https://py-code.org/datasets Nice UX and UI!
Cool find! I think we can do something similar with Dagster assets `metadata`. Similar to what [Subsets](https://github.com/subsetsio/subsets-connectors/blob/main/integrations/assets/fmp/balance_sheet_statements.py#L6-L67) does. Not sure how to deal with dbt models though! Perhaps we can...
```sql copy( select * from 'data/allo_donations.parquet' order by round_id, donor_address, project_id, token_address, recipient_address) to 'data/allo_comp_rs_sorted.parquet' (compression 'zstd', row_group_size 10000000) ```