pdr-backend
                                
                                 pdr-backend copied to clipboard
                                
                                    pdr-backend copied to clipboard
                            
                            
                            
                        [Lake][Fetching vs Joining] Fetch available data from subgraph rather than joining w/ sql.
Background / motivation
- 
It's easier to just fetch the data already from subgraph rather than perform a join (more costly too). These, shouldn't have to be obtain through a join with the predictions table. {etl_bronze_pdr_predictions_table_name}.pair as pair, {etl_bronze_pdr_predictions_table_name}.timeframe as timeframe, {etl_bronze_pdr_predictions_table_name}.source as source, 
- 
The issue here too, is that in SQL there is 1 slot event being joined <= N prediction events. Again, costly. This is being done as a left join, but it would be good to check the results of bronze_pdr_slots.pyas a way to verify this.
TODOs / DoD
- Review this query and it's current results/accuracy.
- Simplify this query by just getting this data from subgraph.
- Review other subgraph queries & etl joins where this could be simplified and fix them
Tasks
- [ ] update slots and other tables to get pair/timeframe/source info from subgraph
- [ ] deprecate implementing this in SQL joins
- [ ] verify that queries are generating corect/expected data
Actually, I couldn't get what the main motivation is. It can process millions of rows in just a second. Why should we update raw tables now, what is the cost for us?
Because it feels like the previous work wasn't quite complete. We can get the data from subgraph and clean this up.
I tagged it as low priority.
I am now reviewing this in issue #1000 and it's coming up again.
The bronze_slots query is taking a very long time to complete, and although it may be to other reasons, I can't help but stare at this super expensive join that we can basically get for free.
This exists in a few different places, and it gets run every time we process an event... notice how silly of a join it is too.. it's basically a configuration (not some special, unique data)
This is now being tracked in #1299 and this will be closed. Please reopen as we address backlog.