pdr-backend icon indicating copy to clipboard operation
pdr-backend copied to clipboard

[Lake][Fetching vs Joining] Fetch available data from subgraph rather than joining w/ sql.

Open idiom-bytes opened this issue 1 year ago • 3 comments

Background / motivation

  1. It's easier to just fetch the data already from subgraph rather than perform a join (more costly too). These, shouldn't have to be obtain through a join with the predictions table. {etl_bronze_pdr_predictions_table_name}.pair as pair, {etl_bronze_pdr_predictions_table_name}.timeframe as timeframe, {etl_bronze_pdr_predictions_table_name}.source as source,

  2. The issue here too, is that in SQL there is 1 slot event being joined <= N prediction events. Again, costly. This is being done as a left join, but it would be good to check the results of bronze_pdr_slots.py as a way to verify this. image

TODOs / DoD

  1. Review this query and it's current results/accuracy.
  2. Simplify this query by just getting this data from subgraph.
  3. Review other subgraph queries & etl joins where this could be simplified and fix them

Tasks

  • [ ] update slots and other tables to get pair/timeframe/source info from subgraph
  • [ ] deprecate implementing this in SQL joins
  • [ ] verify that queries are generating corect/expected data

idiom-bytes avatar May 02 '24 16:05 idiom-bytes

Actually, I couldn't get what the main motivation is. It can process millions of rows in just a second. Why should we update raw tables now, what is the cost for us?

kdetry avatar May 03 '24 11:05 kdetry

Because it feels like the previous work wasn't quite complete. We can get the data from subgraph and clean this up.

I tagged it as low priority.

idiom-bytes avatar May 03 '24 15:05 idiom-bytes

I am now reviewing this in issue #1000 and it's coming up again.

The bronze_slots query is taking a very long time to complete, and although it may be to other reasons, I can't help but stare at this super expensive join that we can basically get for free.

This exists in a few different places, and it gets run every time we process an event... notice how silly of a join it is too.. it's basically a configuration (not some special, unique data)

Screenshot from 2024-05-14 13-34-33

idiom-bytes avatar May 14 '24 20:05 idiom-bytes

This is now being tracked in #1299 and this will be closed. Please reopen as we address backlog.

idiom-bytes avatar Jun 25 '24 22:06 idiom-bytes