Implement scanning iterator `AllRowsScanned`
This pull request introduces an iterator that scans rows on the go.
The motivation for this implementation arose from the need to scan rows into a map. The current built-in functions are sub-optimal for this purpose, as they require scanning into a slice first and then iterating again to copy data into the result map.
4ex:
type Data struct {
ID int32
Name string
}
type Result map[int32]Data
func preIter() (Result, error) {
var rows pgx.Rows
scannedRows, err := pgx.CollectRows[Data](rows, pgx.RowToStructByName)
if err != nil {
return nil, err
}
result := make(Result, len(scannedRows))
for _, row := range scannedRows {
result[row.ID] = row
}
return result, nil
}
func postIter() (Result, error) {
var rows pgx.Rows
result := Result{}
for row, err := range pgx.AllRowsScanned[Data](rows, pgx.RowToStructByName) {
if err != nil {
return nil, err
}
result[row.ID] = row
}
return result, nil
}
This PR is a proposal and currently lacks examples and thorough testing of the function. Further documentation and tests will be provided if the proposal is considered for merging.
There are several points for discussion:
- Since iterators were introduced in Go 1.23, this pull request increases the Go version in go.mod to 1.23. This change breaks the convention of supporting "two most recent major releases."
- The iterator yields one more time in case of an error to ensure there are no errors in the rows after iteration. While this approach aligns with the overall error-check flow of the solution, as it can be considered an "error during scan," I am open to discussions regarding this approach.
Yielding one more time will result in runtime panics, if an error occurs like here https://go.dev/play/p/S6VGunr5iZT.
@felix-roehrich, you are correct. There should be a return statement instead of break -- I've had a feeling that something is off regarding iteration stop, but, as said, not much tests yet.
We have two options here:
- Leave things as-is (with break replaced by return), and require the developer to perform a trailing
rows.Err()check if the iteration stops prematurely. This will require explicit highlight within method documentation. Personally, I lean towards this option, as in my experience with the internal implementation, we have never stopped iteration prematurely, thus never faced the panic. - Expect the developers to always manually check for errors after iteration.
Or, actually, not.. If i dont miss anything -- there is no reason for user of iterator to check rows.Err() in any case.
- In case everything wen good -- there will be only
nyields (n - number of rows in the result). - In case iteration was stopped because there are no more rows -- there will be
nyields. - In case iteration was stopped because there is an error during read -- there will be
n+1yields (n - number of successful row reads). - In case iteration was stopped by developer -- there will be
nyields, because there is basically no way forrows.Err()return an error asrows.Next()won't be called anyways.
So there was single error of mine putting break instead of return.
Yes, lgtm now. I just feel the name is not appropriate, especially 'Scanned' being past tense, something like pgx.Iterate/pgx.ScanRows would be more intuitive.
I do not believe that breaking the support promise will or should happen, so imo this will have to wait until 1.24 is released. Anyway, thanks for looking at this and let's see what @ jackc says.
You didn't mention him BTW, there is space between "at" and nicknameš¤
Regarding the name, GO proposes the name All for functions that returns iterator over all entries of collection.
It brings us to pgx.AllRows and the call signature will be for row, err := range pgx.AllRows(rows), but i would expect it to return rows itself, not scanned.
But in my case, since developer passing scanner function and getting already scanned rows i decided to call it with past tense, since whole process can be described as "iteration over all scanned rows"
The docs also have a lot of examples without All(), e.g. Cities() and not AllCities() or Backward() and not AllBackward(). Thus, I am not so fixated on this. I understand your point about 'Scanned', though.
True, agree.
Regarding the Go version, 1.24 is due any day now so that wouldn't be a problem. It would also would have been possible to use build tags to provide backwards compatibility.
As far as an iterator in general, perhaps it might be sense to support eventually, but I'm not sold on iterators right now.
In this particular implementation, error handling still seems awkward to me. Maybe I'm simply very familiar with the current style of process all the rows and check for errors at the end. But it seems weird to check for error inside of the row handler and then not check after the rows are processed.
Also, I'm not sure how this iterator approach is better than a variant of ForEachRow that used templates and RowToFunc. If anything, I would expect that to be a more ergonomic interface for your original use case.
Ultimately, before integrating iterator support into core pgx, I'd like to see a use case that has significantly better ergonomics than can be done without it.
I personally don't understand how to use ForEachRow to scan rows into structures of.. say 20 fields, without introducing templating engine and codegeneration (which sound odd to me by definition). For that matter - there are no examples of usage ForEachRow in situation of handing more that 1-2 fields throughout documentation whatsoever.
Regarding RowToFunc approach - it is in the description. pgx's methods allows to easily scan only into a slice, which is understandable and obvious, but not true for all the cases of userland, sometimes (in my experience almost half the times) we want to get maps and other weird structures as the result.
And to do so we have to indulge additional iteration over scanned rows and copy data one more time, which is unnecessary in case of iterators usage.
It is not only more code to support, but also less performant. I'll push some benchmarks in a moment showcasing more than 30% uplift in performance, in the situation when we need map as a result. I understand that it is only one of cases but it is there.
I personally don't understand how to use ForEachRow to scan rows into structures of.. say 20 fields, without introducing templating engine and codegeneration (which sound odd to me by definition).
I personally don't understand how to use ForEachRow to scan rows into structures of.. say 20 fields, without introducing templating engine and codegeneration (which sound odd to me by definition).
I was referring to making a new variant of ForEachRow that used generics and RowToFunc to solve this.
Aaah..
Actually i have implemented similar function for our internal lib at the moment pgx gained generic scannersš¤
func ForScannedRows[T any](rows pgx.Rows, scanner pgx.RowToFunc[T], fn ScannedRowIteratorFn[T]) error {
defer rows.Close()
for rows.Next() {
row, err := scanner(rows)
if err != nil {
return err
}
cont, err := fn(&row)
if err != nil {
return err
}
if !cont {
break
}
}
return rows.Err() //nolint:wrapcheck
}
If to repeat functionality from the PR description with that function:
func functorIter() (Result, error) {
var rows pgx.Rows
result := Result{}
err := ForScannedRows(rows, pgx.RowToStructByName, func(row *Data) (bool, error) {
result[row.ID] = row
return true, nil
})
if err != nil {
return nil, err
}
return result, nil
}
It is "almost" the the same, except the fact that it is not possible to return from iteration handler, as it is different scope. We have to stop iteration and only then return from outer function.
Overall i like that approach less.
Your ForScannedRows is the idea of what I was thinking of.
Returning from the outer scope does seem to be the major functional difference.
But the error handling inside the loop just feels so ugly...
I think I'd like something like this to get some real world use outside of pgx before committing to any particular interface. Especially, given that the implementation is a standalone function.
Yes, the upside of this approach is separation of concerns and ability to share logic.
@jackc if it somehow can ease your perception regarding errors handling within loop.
As it figures (I've been searching for same usages of iter.Seq2[T, error]), google itself has same approach for iterators that can return errors: https://cs.opensource.google/search?q=iter.Seq2
Yes, their iterator have exact same signature (it returns next item and error), but still.
Disregarding iterators -- i can prepare PR introducing generic version of foreach, maybe call it ForEachRowScanned?
I've reviewed this again, and I'm starting to lean toward we will want iterators in pgx eventually, but I'm not ready to commit to any particular interface yet.
My reluctance is probably due to my lack of experience with the new custom iterator system. I just don't have enough experience to make a good decision. I'd like to defer this until either I have more confidence in my ability to make a good decision or we have additional feedback of this being used successfully. We can always add it later, but once it is in a pgx release, we can never take it out in the current major version. And since the implementation is small and can be easily copied into any application that needs it, it doesn't actually stop anyone from using it immediately.
As it figures (I've been searching for same usages of iter.Seq2[T, error]), google itself has same approach for iterators that can return errors: https://cs.opensource.google/search?q=iter.Seq2 Yes, their iterator have exact same signature (it returns next item and error), but still.
I looked at the first example:
// All returns an iterator. If an error is returned by the iterator, the
// iterator will stop after that iteration.
func (it *AttackPathIterator) All() iter.Seq2[*securitycenterpb.AttackPath, error] {
return iterator.RangeAdapter(it.Next)
}
This is actually a little scary to me. I'm looking at Go code that is hard to understand. While I am in favor of Go having gained generics and iterators, the naysayers did have a point. Go code can be a lot more opaque than it used to be.
Disregarding iterators -- i can prepare PR introducing generic version of foreach, maybe call it ForEachRowScanned?
Let's hold off on this too. While I'm not ready to merge iterators just yet, they probably are the idiomatic solution going forward. So if we add something now it will probably be more or less deprecated in a year or less.
Understood, I feel that.
I guess PR would be better kept open, so others would not make duplicated PRs and join the discussion?
Yes, let's leave this open and review in the future.