David Gasquez comments

Results 146 comments of


                                            David Gasquez

Make pipeline incremental

The _ideal_ approach I can think of would be to rely on Dagster partitions and sensors. 1. Read the data from IPFS (or github actions cache!) 2. Run Dagster sensors...

Make pipeline incremental

Thinking about [relying on external assets](https://docs.dagster.io/concepts/assets/external-assets). Make the previous run the external assets and compute the diff using sensors?

Make pipeline incremental

We could also attach to the previous database and use it as the current state. Run sensors and then the remaining partitions.

Draw diagrams to improve docs

Love it! Let's start with this diagram and put it under the `README.md`. Can you share the Excalidraw URL? You can click `share` and it'll create a Read Only link...

Improve Portal Documentation (Data Catalog, How it Works, ...)

@DistributedDoge mentioned to publish a notebook that loops over `tables` or whatever export dir is, fetches schema from `.parquet`files and surfaces that.

Improve Portal Documentation (Data Catalog, How it Works, ...)

> The code has landed, catalog will be built next time you update github pages by doing make publish. :facepalm: I thought the website was published with each push! :shrug:...

Improve Portal Documentation (Data Catalog, How it Works, ...)

Working! :tada: https://davidgasquez.github.io/gitcoin-grants-data-portal/catalog.html

Improve Portal Documentation (Data Catalog, How it Works, ...)

Sharing it here so I remember in the future. Would be awesome to aim for something like this: https://py-code.org/datasets Nice UX and UI!

Improve Portal Documentation (Data Catalog, How it Works, ...)

Cool find! I think we can do something similar with Dagster assets `metadata`. Similar to what [Subsets](https://github.com/subsetsio/subsets-connectors/blob/main/integrations/assets/fmp/balance_sheet_statements.py#L6-L67) does. Not sure how to deal with dbt models though! Perhaps we can...

Optimize Parquet files

```sql copy( select * from 'data/allo_donations.parquet' order by round_id, donor_address, project_id, token_address, recipient_address) to 'data/allo_comp_rs_sorted.parquet' (compression 'zstd', row_group_size 10000000) ```