Ability to return an iterator on rows
Example of generated query:
const iterCities = `-- name: IterCities :iter
SELECT slug, name
FROM city
ORDER BY name
`
func (q *Queries) IterCities(ctx context.Context) IterCitiesRows {
rows, err := q.query(ctx, q.iterCitiesStmt, iterCities)
if err != nil {
return IterCitiesRows{err: err}
}
return IterCitiesRows{rows: rows}
}
type IterCitiesRows struct {
rows *sql.Rows
err error
}
func (r *IterCitiesRows) Iterate() iter.Seq[City] {
if r.rows == nil {
return func(yield func(City) bool) {}
}
return func(yield func(City) bool) {
defer r.rows.Close()
for r.rows.Next() {
var i City
err := r.rows.Scan(&i.Slug, &i.Name)
if err != nil {
r.err = err
return
}
if !yield(i) {
r.err = r.rows.Close()
return
}
}
}
}
func (r *IterCitiesRows) Close() error {
return r.rows.Close()
}
func (r *IterCitiesRows) Err() error {
if r.err != nil {
return r.err
}
return r.rows.Err()
}
Example of use:
func example(ctx context.Context, q *Queries) {
rows := q.IterCities(ctx)
for city := range rows.Iterate() {
// do something with city
}
if err := rows.Err(); err != nil {
// handle error
}
}
UPD: updated the code according to the latest changes
Thank you so much for the first pass on iterator support. Could you update your PR description with an example of how to use the Iterate() method? I looked over the changes and didn't see an end-to-end test of the new method.
Fixes #720
@kyleconroy I've placed end-to-end tests here https://github.com/sqlc-dev/sqlc/pull/3631/files#diff-ef71fe84cdfc32f51a1cf70a80a17506a422b5ccc42e714a56a6a2334e6a4bc5 Do you need some other kind of tests? Please give an example
@kyleconroy What do you think about the Close method? I still haven't decided if it's necessary.
I found some example tests here https://github.com/sqlc-dev/sqlc/tree/main/examples and added iterator tests there
@kyleconroy, hi I updated my pull request, check it please
I see that the stdlib sometimes uses the suffix Seq() to indicate methods that returns iterators. I suggest to think a bit more if Iterate() is the appropriate name for this functionality. I'm not saying it should be changed to Seq(), but I just want to make sure that some more thought gets put into the naming selection.
I see that the stdlib sometimes uses the suffix Seq() to indicate methods that returns iterators. I suggest to think a bit more if Iterate() is the appropriate name for this functionality. I'm not saying it should be changed to Seq(), but I just want to make sure that some more thought gets put into the naming selection.
iter.Seq and iter.Seq2 are the names of the generic types for iterators, but methods or functions in the standard library that return iterators have many different names. So I think Iterate would be a good method name.
@kyleconroy What do you think about the Close method? I still haven't decided if it's necessary.
I do not think the Close method is needed, but the func returned by Iterate needs to have defer r.rows.Close() added. Without it a recovered panic would leave the connection in the middle of a fetch
I do not think the
Closemethod is needed, but thefuncreturned byIterateneeds to havedefer r.rows.Close()added. Without it a recovered panic would leave the connection in the middle of a fetch
@gbarr, thanks for your comment
- I left Close() because it might be needed if you call a method like IterCities and don't call Iterate(). I can't imagine why it might be needed, but I don't see the point in disallowing this behavior
- I added defer r.rows.Close() as per your recommendation
Instead of having an Err() method, should IterCities just be something like this?
IterCities(ctx context.Context) (seq iter.Seq2[City, error], stop func())
Usage would be:
func example(ctx context.Context, q *Queries) error {
rows, stop := q.IterCities(ctx)
defer stop()
for city, err := range rows {
if err != nil {
return err
}
// do something with city
}
return nil
}
I think this is a bit cleaner than the above generated code, saving a whole new type for the iterator.
Instead of having an
Err()method, shouldIterCitiesjust be something like this?IterCities(ctx context.Context) (seq iter.Seq2[City, error], stop func())
I thought about such an interface, but I'm afraid that if the user uses range with one variable, the behavior will not match the expected one
I found these in the release notes for Go 1.24. The bytes package adds several functions that work with iterators:
- Lines returns an iterator over the newline-terminated lines in the byte slice s.
- SplitSeq returns an iterator over all substrings of s separated by sep.
- SplitAfterSeq returns an iterator over substrings of s split after each instance of sep.
- FieldsSeq returns an iterator over substrings of s split around runs of whitespace characters, as defined by unicode.IsSpace.
- FieldsFuncSeq returns an iterator over substrings of s split around runs of Unicode code points satisfying f(c).
I don't like the Seq suffixes, but I do understand why they picked them.
I found these in the release notes for Go 1.24. The bytes package adds several functions that work with iterators
Following this example, it seems more appropriate to rename Iterate() to Rows() / Items() / Values(). Because the Seq suffix is used when there was already a Foo() or FooBar() function, and we want a similar FooSec() function that returns a sequence
I found these in the release notes for Go 1.24. The bytes package adds several functions that work with iterators
Following this example, it seems more appropriate to rename Iterate() to Rows() / Items() / Values(). Because the Seq suffix is used when there was already a Foo() or FooBar() function, and we want a similar FooSec() function that returns a sequence
I agree that the Seq suffix is only being used when an iterator implementation of an existing function is being added.
My preference would be for Iterate() to be renamed as Rows(). Second preference of Items(). Values() to me implies each iteration returns a single value and not a collection of values.
I would prefer Items(). Rows() is associated with sql.Rows and can be treated as raw data.
It's great to see iterators coming to sqlc. Adding memory pooling as an option on top of this would make this truly a game-changer for large result sets. I would imagine saving the pool in a public variable, and let the callside have the responsibility to put the object back. Thoughts? Maybe out of scope for this one?
I'm up for this one. The only thing that blocks us from migrating from pure SQLx is the absence of an iterator.
Not to blow up our existing design, but I just read this post (https://blog.thibaut-rousseau.com/blog/writing-testing-a-paginated-api-iterator/) which makes me think we may want to consider the following:
func example(ctx context.Context, q *Queries) {
for city, err := range rows.IterCities(ctx) {
// do something with city
}
}
The upside to this design is that it's difficult to use incorrectly. We don't have to decide on a name for the Iter function, since it's removed. You also can't forget to call Close() on the underlying rows resource.
The problem with using Seq2 that returns err is that if the code breaks out of the loop, any error returned from the call r.Close() is lost as there is no way to get it
Just to add my 2cts to the discussion: it seems weird to me to have the error, and have to check it, in every loop iteration. That’s an advantage of the initial proposal.
Since we can start the query in the first iteration, and break inside the loop, we don’t need Close. We can use Err() to determine if the query or any of the row fetches failed.
I personally prefer:
func example(ctx context.Context, q *Queries) error {
cities := q.IterCities(ctx)
for city := range cities.Rows() {
// do something with city
}
return cities.Err()
}
Over:
func example(ctx context.Context, q *Queries) error {
for city, err := range q.IterCities(ctx) {
if err != nil {
return err
}
// do something with city
}
}
Simply because of the lack of err clutter inside the loop.
This does assume that:
- when you call IterCities(), we don’t perform the query yet, so just in case you never call Iterate() we don’t have a dangling query. (Also the added benefit that IterCities doesn’t return an err itself, which is something pgx regrets in their API.)
- Iterate() performs the query and starts streaming the results. If anything fails, it sets the Err() result and returns. After the last row we close the query, set the Err() result to nil and return.
- If the loop body breaks (the yield func returns false) we close the query and set Err() to the result of that close operation, then return.
After an error happens in the iterator, if it is never possible to continue it would make more sense to use iter.Seq[T]. However if there is a possibility that the user can safely continue on some errors then it would make more sense to use iter.Seq2[T, error].
Is there ever a case when the user could potentially continue streaming rows from the server after an error? Maybe some kind of DB driver data parse error could be ignored?
Also, would it make sense to stop iterating and then continue iterating later? I'm thinking something like a CURSOR, where I have some process that reads in 100 rows, then does something else, then reads another 100, and so on.
I think we should instead have a function that returns:
- a simple iterator
- a function that returns an error
Code:
func example(ctx context.Context, q *Queries) error {
cities, getCitiesErr := q.IterCities(ctx)
for city := range cities {
// do something with city
}
return getCitiesErr()
}
q.IterCities() should not initiate any query to the DB.
When the iterator starts to iterate, it should do the query to the DB, and start to yield the objects.
If the yield function returns false (the loop is interrupted), it should release all DB resources.
If an error occurs in the iterator, it should stop iterating, and the "error getter" function should return the error.
I prefer this design, because it solves several problems:
- If the iterator is never iterated, no query is done
- If the iterator is interrupted, it release automatically all resources
- The "error getter" function must be assigned to a variable, so it forces the user to handle it. (Nothing forces the user to call
cities.Err()in the previous code)
@viewsharp did you close this PR intentionally ?
I apologize. I wanted to sync my repository and didn't know that it could lead to closing the PR. I will restore it soon