pdr-backend [Lake][Subgraph] Improve query structure (and maybe performance)

Motivation

There are various queries inside subgraph/ that may perform less than optimal due to their structure. Nested where clauses may improve the performance or not.

Queries such as the one inside subgraph_payouts.py could perhaps be improved.

Examples

How to restructure predictPayouts such that we can implement an _in array[] where clause.

query{
  predictPayouts{
    payout
    prediction{
      slot{
        predictContract(where: {id_in:["0x18f54cc21b7a2fdd011bea06bba7801b280e3151"]}){
          token {
            id
          }
        }
      }
    }
  }
}

DoD:

[ ] Improved query structure and performance for the goals of updating lake/analytics

Jan 25 '24 16:01 idiom-bytes

Notes

Overall, we have low conviction around problems/improvements as a result of this update. This is therefore low priority.

Jan 25 '24 16:01 idiom-bytes

We can improve the fetch step by returning more of the expected records from predictPredictions. By doing this, we can reduce the need to fetch the predictPayouts table. We would still need to handle the predctTruevals.

If we can improve the number of fetches and compute further, consider increasing priority. https://github.com/oceanprotocol/ocean-subgraph/issues/760

query{
  predictPredictions(where:{ or: [{timestamp_gt:100},{payout_:{timestamp_gt:100}}]}){
    slot{
      id
    }
    user {
      id
    }
    stake
    payout{
      
      timestamp
      id
    }
  }
}

Jan 25 '24 20:01 idiom-bytes

I spent some time and let me explain why the first example won't work.

Based on the provided GraphQL query, it seems that you want to filter predictPayouts based on a specific predictContract value. Tha query is not convenient for this goal


query {
  predictPayouts {
    payout
    prediction {
      slot {
        predictContract(where: {id_in: ["0x18f54cc21b7a2fdd011bea06bba7801b280e3151"]}) {
          token {
            id
          }
        }
      }
    }
  }
}

In this query:

The predictPayouts are being retrieved without any filtering at their level.
The filtering is applied at the predictContract level using where: {id_in: [...]}.
This means all predictPayouts are retrieved, but only the predictContract with the specified ID is filtered within each predictPayout.

The limitation here is that the filtering for predictContract does not directly filter the predictPayouts.

Instead, it only filters the predictContract within each predictPayout result.
the goal is to retrieve only those predictPayouts that have a specific predictContract, so, the query needs to be different, we have to put the parameter to the main "where" query.

I feel like we did a short talk about it but I couldn't be sure and I explained with text

Jan 31 '24 10:01 kdetry

There are still low performance queries and other things that aren't optimal but i'm closing this issue and moving everything to #1299 for further review

Jun 25 '24 16:06 idiom-bytes

pdr-backend pdr-backend copied to clipboard

[Lake][Subgraph] Improve query structure (and maybe performance)

Motivation

Examples

DoD:

Notes

pdr-backend
pdr-backend copied to clipboard