prisma-client-go
prisma-client-go copied to clipboard
Add CreateMany
Hi, is this a good first issue? I would love to give it a try, do you have some directions on where to get started?
Kind of, yeah. All generator related code is not the easiest to grasp though, and there are no docs for Prisma AST. There are some tricks to see the info etc. though and I could write up some doc, but it definitely takes some time to understand the concepts and how the Prisma internals work (which are not optimized for Go but rather JS). Also, before starting, it would be important to align on how the syntax would look like.
I'm also not sure about the priority of this, as you can create many docs using transactions right now:
createUserA := client.User.CreateOne(
User.Email.Set("a"),
User.ID.Set("a"),
).Tx()
createUserB := client.User.CreateOne(
User.Email.Set("b"),
User.ID.Set("b"),
).Tx()
if err := client.Prisma.Transaction(createUserA, createUserB).Exec(ctx); err != nil {
t.Fatal(err)
}
This would also work with a loop or X docs:
var txs []transaction.Param
for item := someItems {
tx := client.User.CreateOne(
User.Email.Set("b"),
User.ID.Set("b"),
).Tx()
txs = append(txs, tx)
}
if err := client.Prisma.Transaction(txs...).Exec(ctx); err != nil {
t.Fatal(err)
}
Hmm yeah.. that's what I am doing right now, problem is the memory usage on that, a transaction is probably quite big, and when passing it to the prisma intermediate service memory usage is horrendous:
I am hoping this could reduce the memory usage. Or is it just a prisma service issue handling the usecase of inserting ~70k records a lot of times without going above crazy memory usage
Even in my process all the transactions when batching them in batches of 20k I experience 200MB of memory usage on that alone (which would be okay, but far from what CreateMany could achieve with instead of adding transactions to a list we could add Models to the list for example)
Syntax wise I imagined it to be something like this:
userA := db.User{ id:"a", email:"a" }
userB := db.User{ id:"b", email:"b" }
if err := client.User.CreateMany(userA, userB).Exec(ctx); err != nil {
log.Fatal(err)
}
Well, internally a CreateMany would also run in a transaction, but potentially it's more optimized, but I'm not sure. Since you are saying "what CreateMany could achieve" – did you actually test this? You could do so with the JS client and see if CreateMany is faster than CreateOne with transactions.
Regarding the syntax, this would not really work, as the syntax would be different from CreateOne and it would not benefit from the extra type-safety (think about also linking records).
Hmm okay, will try to check it out in the JS client, this was just an assumption from me for now
@MaximilianGaedig Did you get a chance to see if it was faster for transactions?
Not yet, but it's on my TODO list, I needed more control over the database for the project I used this in anyways tho so I went to pgx with that
I see, no worries
Also looked for this, would it make sense to just spin up a bunch of goroutines each doing a CreateOne
in the meantime? something like
g, ctx := errgroup.WithContext(ctx)
g.SetLimit(maxWorkerGoroutines)
// Producer
nodeIds := make(chan int)
g.Go(func() error {
defer close(nodeIds)
for i := 0; i < v.Len(); i += 1 {
nodeIds <- i
}
return nil
})
type ProcessedNode struct {
idx int
res *<result>
}
// Mapper
queue := make(chan ProcessedNode)
workers := int32(maxWorkerGoroutines)
for i := 0; i < maxWorkerGoroutines; i++ {
g.Go(func() error {
defer func() {
// decrement worker count
if atomic.AddInt32(&workers, -1) == 0 {
close(queue)
}
}()
for idx := range nodeIds {
if result, err := <createOne>(<params>),
); err != nil {
return err
} else {
queue <- ProcessedNode{idx: idx, res: result}
}
}
return nil
})
}
// Reducer
results := make([]*<result>, v.Len())
g.Go(func() error {
for nodeRes := range queue {
results[nodeRes.idx] = nodeRes.res
}
return nil
})
return results, g.Wait()
@nettrino Yes this would work, with the disadvantage that it doesn't run in a transaction. So it might work depending on the use-case.
@steebchen my understanding is that a transaction is an all-or-nothing operation, where statements are executed in order as passed, whereas my above suggestion is just for records that could be created asynchronously as needed without depending on one another - is that assumption correct?
Yes, correct. Feel free to ask over discord as well