FSharp.Data.GraphQL icon indicating copy to clipboard operation
FSharp.Data.GraphQL copied to clipboard

Are Dataloaders available?

Open inputfalken opened this issue 6 years ago • 7 comments

Hi, I'm wondering if there's dataloaders or something similar is available.

inputfalken avatar Oct 31 '19 18:10 inputfalken

I am also wondering what the best way to resolve the n+1 problem is

Kurren123 avatar Mar 26 '20 22:03 Kurren123

The short answer is this library does not provide data-loaders. However, it is fairly easy to bring your own.

Here is my answer from elsewhere...


There are a few "data-loader" type libraries for F# and .NET, however if you are also using FSharp.Data.GraphQL then there are fewer solutions that integrate well.

Note that the "Haxl" approach will not work (easily) with FSharp.Data.GraphQL. This is because the Haxl types must be integrated into GraphQL query models, but FSharp.Data.GraphQL only understands sync and async.

The most suitable implementation that I could find is in FSharp.Core.Extensions. This is fairly new library, but it's high quality and Apache 2.0 licensed.

I'm sure there are many ways it can be integrated it into FSharp.Data.GraphQL, however my preferred approach was to put the data-loaders into the root value of the schema. This allows all GraphQL resolvers down the tree to access it.

I think the best way to explain it is to show an example.

Here we have a domain of "People" who can have zero or more "followers", who are also "People". Each person has a globally unique ID. There is significant overlap in the followers between people, so a naive solution may re-fetch the same data repeatedly. Our database layer can fetch many person records in one query, so we would like to leverage that where possible.

You can paste this code into an .fsx file and run it. The dependencies are fetched by Paket.

paket.dependencies

generate_load_scripts: true

source https://www.nuget.org/api/v2
source https://api.nuget.org/v3/index.json

storage: none
framework: net5.0, netstandard2.1

nuget FSharp.Core 5.0.0
nuget FSharp.Data.GraphQL.Server 1.0.7

github Horusiath/fsharp.core.extensions:0ff5753bb6f232e0ef3c446ddcc72345b74174ca

DataLoader.fsx

#load ".paket/load/net50/FSharp.Data.GraphQL.Server.fsx"

#load "paket-files/Horusiath/fsharp.core.extensions/src/FSharp.Core.Extensions/Prolog.fs"
#load "paket-files/Horusiath/fsharp.core.extensions/src/FSharp.Core.Extensions/AsyncExtensions.fs"

type Person =
  {
    ID : string
    Name : string
  }

// Mocks a real database access layer
module DB =

  // Used to avoid interleaving of printfn calls during async execution
  let private logger = MailboxProcessor.Start (fun inbox -> async {
    while true do
      let! message = inbox.Receive()

      printfn "DB: %s" message
  })

  let private log x =
    logger.Post(x)

  // Our data-set
  let private people =
    [
      { ID = "alice"; Name = "Alice" }, [ "bob"; "charlie"; "david"; "fred" ]
      { ID = "bob"; Name = "Bob" }, [ "charlie"; "david"; "emily" ]
      { ID = "charlie"; Name = "Charlie" }, [ "david" ]
      { ID = "david"; Name = "David" }, [ "emily"; "fred" ]
      { ID = "emily"; Name = "Emily" }, [ "fred" ]
      { ID = "fred"; Name = "Fred" }, []
    ]
    |> Seq.map (fun (p, fs) -> p.ID, (p, fs))
    |> Map.ofSeq

  let fetchPerson id =
    async {
      log $"fetchPerson {id}"

      match people |> Map.find id with
      | (x, _) -> return x
    }

  let fetchPersonBatch ids =
    async {
      let idsString = String.concat "; " ids
      log $"fetchPersonBatch [ {idsString} ]"

      return
        people
        |> Map.filter (fun k _ -> Set.contains k ids)
        |> Map.toSeq
        |> Seq.map (snd >> fst)
        |> Seq.toList
    }

  let fetchFollowers id =
    async {
      log $"fetchFollowers {id}"

      match people |> Map.tryFind id with
      | Some (_, followerIDs) -> return followerIDs
      | _ -> return []
    }





// GraphQL type definitions

open FSharp.Core
open FSharp.Data.GraphQL
open FSharp.Data.GraphQL.Types

#nowarn "40"

[<NoComparison>]
type Root =
  {
    FetchPerson : string -> Async<Person>
    FetchFollowers : string -> Async<string list>
  }

let rec personType =
  Define.Object(
    "Person",
    fun () -> [
      Define.Field("id", ID, fun ctx p -> p.ID)
      Define.Field("name", String, fun ctx p -> p.Name)
      Define.AsyncField("followers", ListOf personType, fun ctx p -> async {
        let root = ctx.Context.RootValue :?> Root

        let! followerIDs = root.FetchFollowers p.ID

        let! followers =
          followerIDs
          |> List.map root.FetchPerson
          |> Async.Parallel

        return Seq.toList followers
      })
    ])

let queryRoot = Define.Object("Query", [
  Define.AsyncField(
    "person",
    personType,
    "Fetches a person by ID",
    [
      Define.Input("id", ID)
    ],
    fun ctx root -> async {
      let id = ctx.Arg("id")

      return! root.FetchPerson id
    })
])

// Construct the schema once to cache it
let schema = Schema(queryRoot)




// Run an example query...
// Here we fetch the followers of the followers of the followers of `alice`
// This query offers many optimization opportunities to the data-loader

let query = """
  query Example {
    person(id: "alice") {
      id
      name
      followers {
        id
        name
        followers {
          id
          name
          followers {
            id
            name
          }
        }
      }
    }
  }
  """

let executor = Executor(schema)

async {
  // Construct a data-loader for fetch person requests
  let fetchPersonBatchFn (requests : Set<string>) =
    async {
      let! people =
        requests
        |> DB.fetchPersonBatch

      let responses =
        Seq.zip requests people
        |> Map.ofSeq

      return responses
    }

  let fetchPersonContext = DataLoader.context ()
  let fetchPersonLoader = DataLoader.create fetchPersonContext fetchPersonBatchFn

  // Construct a data-loader for fetch follower requests
  let fetchFollowersBatchFn (requests : Set<string>) =
    async {
      let! responses =
        requests
        |> Seq.map (fun id ->
          async {
            let! followerIDs = DB.fetchFollowers id

            return id, followerIDs
          })
        |> Async.Parallel

      return Map.ofSeq responses
    }

  let fetchFollowersContext = DataLoader.context ()
  let fetchFollowersLoader = 
    DataLoader.create fetchFollowersContext fetchFollowersBatchFn

  let root =
    {
      FetchPerson = fun id -> fetchPersonLoader.GetAsync(id)
      FetchFollowers = fun id -> fetchFollowersLoader.GetAsync(id)
    }

  // Uncomment this to see how sub-optimal the query is without the data-loader
  // let root =
  //   {
  //     FetchPerson = DB.fetchPerson
  //     FetchFollowers = DB.fetchFollowers
  //   }

  // See https://bartoszsypytkowski.com/data-loaders/
  do! Async.SwitchToContext fetchPersonContext
  do! Async.SwitchToContext fetchFollowersContext

  // Execute the query
  let! response = executor.AsyncExecute(query, root)

  printfn "%A" response
}
|> Async.RunSynchronously

njlr avatar Aug 30 '22 17:08 njlr

@xperiandri @nikhedonia Should data-loaders be part of this library?

njlr avatar Aug 30 '22 17:08 njlr

@jberzy told me to use a custom middleware. That middleware can intercept a request and get data from the request cache I do something similar in the AuthorizationMiddleware https://github.com/xperiandri/FSharp.Data.GraphQL/commit/88dd4be97c368bc3ada5833bcab95b6203b402ef

xperiandri avatar Aug 30 '22 17:08 xperiandri

Another option for people stumbling on this is https://github.com/cmeeren/BatchIt

Basically the pattern is to create a BatchIt instance and pass it into the GraphQL resolvers via the root object.

njlr avatar Jul 18 '23 09:07 njlr

Looks interesting. So if I have such a type:

let AssessmentsStatisticsType =

    let getRoot (ctx: ResolveFieldContext) = ctx.Context.RootValue :?> Root

    // TODO: Optimize
    let getTotalChecksCount (ctx: ResolveFieldContext) struct (tenantId, dateRange) = async {
        let root = getRoot ctx
        let query: GetTotalChecksCount.Query = {
            TenantId = tenantId
            DateRange = dateRange
        }
        let handle = root.GetRequiredService<GetTotalChecksCount.Handler>()
        return! handle root.RequestAborted query |> Task.map int64
    }

    let getChecksCount (ctx: ResolveFieldContext) struct (tenantId, dateRange) = async {
        let root = getRoot ctx
        let query: GetChecksCount.Query = {
            TenantId = tenantId
            DateRange = dateRange
        }
        let handle = root.GetRequiredService<GetChecksCount.Handler>()
        return! handle root.RequestAborted query
    }

    let getChecksCountForStatus status (ctx: ResolveFieldContext) struct (tenantId, dateRange) = async {
        let root = getRoot ctx
        let query: GetChecksCountForStatus.Query = {
            TenantId = tenantId
            Status = status
            DateRange = dateRange
        }
        let handle = root.GetRequiredService<GetChecksCountForStatus.Handler>()
        return! handle root.RequestAborted query
    }

    // TODO: Optimize
    let getTotalChecksCountForStatus status (ctx: ResolveFieldContext) dateRange = async {
        let! checksCount = getChecksCountForStatus status ctx dateRange
        return checksCount.CountAll |> int64
    }

    Define.Object<struct(TenantId * DateTimeRange)>(
        name = "AssessmentsStatistics",
        fields = [
            Define.AsyncField("totalChecks", LongType, getTotalChecksCount)
            Define.AsyncField("checks", ChecksCountType, getChecksCount)
            Define.AsyncField("totalCompletedChecks", LongType, getTotalChecksCountForStatus Completed)
            Define.AsyncField("completedChecks", ChecksCountType, getChecksCountForStatus Completed)
        ]
    )

How do I need to change it so that totalCompletedChecks and completedChecks fields reuse the same result?

xperiandri avatar Jul 18 '23 09:07 xperiandri

Sorry, I'm not quite able to figure out what the intent is behind that code!

However, here is a self-contained example showing how BatchIt can optimize data-fetching. Paste it into an .fsx.

You can see from the logging that it only makes 2 calls to the "database", despite resolving several objects.

#r "nuget: BatchIt, 1.2.0"
#r "nuget: FSharp.Data.GraphQL.Server, 1.0.7"

open BatchIt
open FSharp.Data.GraphQL
open FSharp.Data.GraphQL.Types




// Utils

let logger =
  MailboxProcessor.Start
    (fun inbox ->
      let rec loop () = async {
        let! msg = inbox.Receive()

        printfn "%s" msg

        return! loop ()
      }

      loop ())

let logln (s : string) =
  logger.Post s




// Domain model

type AuthorID = 
  | AuthorID of string

type BookID = 
  | BookID of string

type Author =
  {
    FirstName : string
    LastName : string
  }

type Book =
  {
    Title : string
    AuthorID : AuthorID
  }





// Data access layer

module Author =

  let fetchBatch (requests : AuthorID array) : Async<(AuthorID * Author option) array> =
    async {
      logln $"Author.fetchBatch %A{requests}"

      return!
        [
          for request in Array.distinct requests do
            async {
              return
                match request with
                | AuthorID "ian-m-banks" ->
                  request, Some { FirstName = "Iain M."; LastName = "Banks" }
                | AuthorID "john-brunner" ->
                  request, Some { FirstName = "John"; LastName = "Brunner" }
                | AuthorID "frederik-pohl" ->
                  request, Some { FirstName = "Frederik"; LastName = "Pohl" }
                | _ ->
                  request, None
            }
        ]
        |> Async.Parallel
    }

module Book =

  let fetchBatch (requests : BookID array) : Async<(BookID * Book option) array> =
    async {
      logln $"Book.fetchBatch %A{requests}"

      return!
        [
          for request in Array.distinct requests do
            async {
              return
                match request with
                | BookID "gateway" ->
                  request, Some { Title = "Gateway"; AuthorID = AuthorID "frederik-pohl" }
                | BookID "the-sheep-look-up" ->
                  request, Some { Title = "The Sheep Look Up"; AuthorID = AuthorID "john-brunner" }
                | BookID "consider-phlebas" ->
                  request, Some { Title = "Consider Phlebas"; AuthorID = AuthorID "ian-m-banks" }
                | BookID "the-player-of-games" ->
                  request, Some { Title = "The Player of Games"; AuthorID = AuthorID "ian-m-banks" }
                | _ ->
                  request, None
            }
        ]
        |> Async.Parallel
    }




// Schema definition

[<NoComparison>]
type Root =
  {
    FetchBooks : Async<BookID list>
    FetchBook : BookID -> Async<Book option>
    FetchAuthor : AuthorID -> Async<Author option>
  }

let authorType =
  Define.Object<Author>(
    name = "Author",
    fields =
      [
        Define.Field("firstName", String, fun _ author -> author.FirstName)
        Define.Field("lastName", String, fun _ author -> author.LastName)
      ]
  )

let bookType =
  let resolveAuthor (ctx : ResolveFieldContext) (book : Book) =
    let root = ctx.Context.RootValue :?> Root

    root.FetchAuthor book.AuthorID

  Define.Object<Book>(
    name = "Book",
    fields =
      [
        Define.Field("title", String, fun _ book -> book.Title)
        Define.AsyncField("author", Nullable authorType, resolveAuthor)
      ]
  )

let queryType =
  let resolveBooks (ctx : ResolveFieldContext) (root : Root) =
    async {
      let! bookIDs = root.FetchBooks

      let! books =
        bookIDs
        |> Seq.map root.FetchBook
        |> Async.Parallel

      let books =
        books
        |> Seq.choose id
        |> Seq.toList

      return books
    }

  Define.Object<Root>(
    name = "Query",
    fields =
      [
        Define.AsyncField("books", ListOf bookType, resolveBooks)
      ]
  )

let schema = Schema(queryType)




// Demo

let executor = Executor(schema)

let fetchBooks =
  async {
    return
      [
        BookID "gateway"
        BookID "the-sheep-look-up"
        BookID "consider-phlebas"
        BookID "the-player-of-games"
      ]
  }

let fetchBook =
  Batch.Create(Book.fetchBatch, 50, 100, 1000)
  // Uncomment to disable batching
  // (fun bookID ->
  //   async {
  //     let! results = Book.fetchBatch [| bookID |]

  //     return snd results[0]
  //   })

let fetchAuthor =
  Batch.Create(Author.fetchBatch, 50, 100, 1000)
  // Uncomment to disable batching
  // (fun authorID ->
  //   async {
  //     let! results = Author.fetchBatch [| authorID |]

  //     return snd results[0]
  //   })

let root =
  {
    FetchBooks = fetchBooks
    FetchBook = fetchBook
    FetchAuthor = fetchAuthor
  }

let query =
  """
  query {
    books {
      title
      author {
        firstName
        lastName
      }
    }
  }
  """

let response =
  executor.AsyncExecute(query, root)
  |> Async.RunSynchronously

logln $"%A{response}"

I think in your case you would have totalCompletedChecks and completedChecks fields both call the same Batch that would be available on the GraphQL root object.

njlr avatar Jul 18 '23 17:07 njlr