pdr-backend icon indicating copy to clipboard operation
pdr-backend copied to clipboard

[Lake][Subgraph] Improve query structure (and maybe performance)

Open idiom-bytes opened this issue 1 year ago • 3 comments

Motivation

There are various queries inside subgraph/ that may perform less than optimal due to their structure. Nested where clauses may improve the performance or not.

Queries such as the one inside subgraph_payouts.py could perhaps be improved.

Examples

How to restructure predictPayouts such that we can implement an _in array[] where clause.

query{
  predictPayouts{
    payout
    prediction{
      slot{
        predictContract(where: {id_in:["0x18f54cc21b7a2fdd011bea06bba7801b280e3151"]}){
          token {
            id
          }
        }
      }
    }
  }
}

DoD:

  • [ ] Improved query structure and performance for the goals of updating lake/analytics

idiom-bytes avatar Jan 25 '24 16:01 idiom-bytes

Notes

Overall, we have low conviction around problems/improvements as a result of this update. This is therefore low priority.

idiom-bytes avatar Jan 25 '24 16:01 idiom-bytes

We can improve the fetch step by returning more of the expected records from predictPredictions. By doing this, we can reduce the need to fetch the predictPayouts table. We would still need to handle the predctTruevals.

If we can improve the number of fetches and compute further, consider increasing priority. https://github.com/oceanprotocol/ocean-subgraph/issues/760

query{
  predictPredictions(where:{ or: [{timestamp_gt:100},{payout_:{timestamp_gt:100}}]}){
    slot{
      id
    }
    user {
      id
    }
    stake
    payout{
      
      timestamp
      id
    }
  }
}

idiom-bytes avatar Jan 25 '24 20:01 idiom-bytes

I spent some time and let me explain why the first example won't work.

Based on the provided GraphQL query, it seems that you want to filter predictPayouts based on a specific predictContract value. Tha query is not convenient for this goal


query {
  predictPayouts {
    payout
    prediction {
      slot {
        predictContract(where: {id_in: ["0x18f54cc21b7a2fdd011bea06bba7801b280e3151"]}) {
          token {
            id
          }
        }
      }
    }
  }
}

In this query:

  • The predictPayouts are being retrieved without any filtering at their level.
  • The filtering is applied at the predictContract level using where: {id_in: [...]}.
  • This means all predictPayouts are retrieved, but only the predictContract with the specified ID is filtered within each predictPayout.

The limitation here is that the filtering for predictContract does not directly filter the predictPayouts.

  • Instead, it only filters the predictContract within each predictPayout result.
  • the goal is to retrieve only those predictPayouts that have a specific predictContract, so, the query needs to be different, we have to put the parameter to the main "where" query.

I feel like we did a short talk about it but I couldn't be sure and I explained with text

kdetry avatar Jan 31 '24 10:01 kdetry

There are still low performance queries and other things that aren't optimal but i'm closing this issue and moving everything to #1299 for further review

idiom-bytes avatar Jun 25 '24 16:06 idiom-bytes