sqlc icon indicating copy to clipboard operation
sqlc copied to clipboard

Ability to return an iterator on rows

Open viewsharp opened this issue 1 year ago • 23 comments

Example of generated query:

const iterCities = `-- name: IterCities :iter
SELECT slug, name
FROM city
ORDER BY name
`

func (q *Queries) IterCities(ctx context.Context) IterCitiesRows {
	rows, err := q.query(ctx, q.iterCitiesStmt, iterCities)
	if err != nil {
		return IterCitiesRows{err: err}
	}
	return IterCitiesRows{rows: rows}
}

type IterCitiesRows struct {
	rows *sql.Rows
	err  error
}

func (r *IterCitiesRows) Iterate() iter.Seq[City] {
	if r.rows == nil {
		return func(yield func(City) bool) {}
	}

	return func(yield func(City) bool) {
		defer r.rows.Close()

		for r.rows.Next() {
			var i City
			err := r.rows.Scan(&i.Slug, &i.Name)
			if err != nil {
				r.err = err
				return
			}

			if !yield(i) {
				r.err = r.rows.Close()
				return
			}
		}
	}
}

func (r *IterCitiesRows) Close() error {
	return r.rows.Close()
}

func (r *IterCitiesRows) Err() error {
	if r.err != nil {
		return r.err
	}
	return r.rows.Err()
}

Example of use:

func example(ctx context.Context, q *Queries) {
	rows := q.IterCities(ctx)
	for city := range rows.Iterate() {
		// do something with city
	}
	if err := rows.Err(); err != nil {
		// handle error
	}
}

UPD: updated the code according to the latest changes

viewsharp avatar Oct 03 '24 22:10 viewsharp

Thank you so much for the first pass on iterator support. Could you update your PR description with an example of how to use the Iterate() method? I looked over the changes and didn't see an end-to-end test of the new method.

kyleconroy avatar Oct 07 '24 16:10 kyleconroy

Fixes #720

kyleconroy avatar Oct 07 '24 16:10 kyleconroy

@kyleconroy I've placed end-to-end tests here https://github.com/sqlc-dev/sqlc/pull/3631/files#diff-ef71fe84cdfc32f51a1cf70a80a17506a422b5ccc42e714a56a6a2334e6a4bc5 Do you need some other kind of tests? Please give an example

viewsharp avatar Oct 07 '24 18:10 viewsharp

@kyleconroy What do you think about the Close method? I still haven't decided if it's necessary.

viewsharp avatar Oct 07 '24 18:10 viewsharp

I found some example tests here https://github.com/sqlc-dev/sqlc/tree/main/examples and added iterator tests there

viewsharp avatar Oct 07 '24 19:10 viewsharp

@kyleconroy, hi I updated my pull request, check it please

viewsharp avatar Oct 19 '24 19:10 viewsharp

I see that the stdlib sometimes uses the suffix Seq() to indicate methods that returns iterators. I suggest to think a bit more if Iterate() is the appropriate name for this functionality. I'm not saying it should be changed to Seq(), but I just want to make sure that some more thought gets put into the naming selection.

PatrLind avatar Nov 05 '24 12:11 PatrLind

I see that the stdlib sometimes uses the suffix Seq() to indicate methods that returns iterators. I suggest to think a bit more if Iterate() is the appropriate name for this functionality. I'm not saying it should be changed to Seq(), but I just want to make sure that some more thought gets put into the naming selection.

iter.Seq and iter.Seq2 are the names of the generic types for iterators, but methods or functions in the standard library that return iterators have many different names. So I think Iterate would be a good method name.

gbarr avatar Nov 08 '24 20:11 gbarr

@kyleconroy What do you think about the Close method? I still haven't decided if it's necessary.

I do not think the Close method is needed, but the func returned by Iterate needs to have defer r.rows.Close() added. Without it a recovered panic would leave the connection in the middle of a fetch

gbarr avatar Nov 08 '24 20:11 gbarr

I do not think the Close method is needed, but the func returned by Iterate needs to have defer r.rows.Close() added. Without it a recovered panic would leave the connection in the middle of a fetch

@gbarr, thanks for your comment

  1. I left Close() because it might be needed if you call a method like IterCities and don't call Iterate(). I can't imagine why it might be needed, but I don't see the point in disallowing this behavior
  2. I added defer r.rows.Close() as per your recommendation

viewsharp avatar Nov 09 '24 18:11 viewsharp

Instead of having an Err() method, should IterCities just be something like this?

IterCities(ctx context.Context) (seq iter.Seq2[City, error], stop func())

Usage would be:

func example(ctx context.Context, q *Queries) error {
	rows, stop := q.IterCities(ctx)
	defer stop()
	
	for city, err := range rows {
		if err != nil {
			return err
		}

		// do something with city
	}
	
	return nil
}

I think this is a bit cleaner than the above generated code, saving a whole new type for the iterator.

diamondburned avatar Nov 19 '24 20:11 diamondburned

Instead of having an Err() method, should IterCities just be something like this?

IterCities(ctx context.Context) (seq iter.Seq2[City, error], stop func())

I thought about such an interface, but I'm afraid that if the user uses range with one variable, the behavior will not match the expected one

viewsharp avatar Nov 19 '24 21:11 viewsharp

I found these in the release notes for Go 1.24. The bytes package adds several functions that work with iterators:

  • Lines returns an iterator over the newline-terminated lines in the byte slice s.
  • SplitSeq returns an iterator over all substrings of s separated by sep.
  • SplitAfterSeq returns an iterator over substrings of s split after each instance of sep.
  • FieldsSeq returns an iterator over substrings of s split around runs of whitespace characters, as defined by unicode.IsSpace.
  • FieldsFuncSeq returns an iterator over substrings of s split around runs of Unicode code points satisfying f(c).

I don't like the Seq suffixes, but I do understand why they picked them.

kyleconroy avatar Nov 26 '24 06:11 kyleconroy

I found these in the release notes for Go 1.24. The bytes package adds several functions that work with iterators

Following this example, it seems more appropriate to rename Iterate() to Rows() / Items() / Values(). Because the Seq suffix is ​​used when there was already a Foo() or FooBar() function, and we want a similar FooSec() function that returns a sequence

viewsharp avatar Nov 26 '24 08:11 viewsharp

I found these in the release notes for Go 1.24. The bytes package adds several functions that work with iterators

Following this example, it seems more appropriate to rename Iterate() to Rows() / Items() / Values(). Because the Seq suffix is ​​used when there was already a Foo() or FooBar() function, and we want a similar FooSec() function that returns a sequence

I agree that the Seq suffix is only being used when an iterator implementation of an existing function is being added.

My preference would be for Iterate() to be renamed as Rows(). Second preference of Items(). Values() to me implies each iteration returns a single value and not a collection of values.

gbarr avatar Nov 26 '24 11:11 gbarr

I would prefer Items(). Rows() is associated with sql.Rows and can be treated as raw data.

viewsharp avatar Nov 26 '24 13:11 viewsharp

It's great to see iterators coming to sqlc. Adding memory pooling as an option on top of this would make this truly a game-changer for large result sets. I would imagine saving the pool in a public variable, and let the callside have the responsibility to put the object back. Thoughts? Maybe out of scope for this one?

benjaco avatar Dec 01 '24 15:12 benjaco

I'm up for this one. The only thing that blocks us from migrating from pure SQLx is the absence of an iterator.

MrBanja avatar Dec 20 '24 09:12 MrBanja

Not to blow up our existing design, but I just read this post (https://blog.thibaut-rousseau.com/blog/writing-testing-a-paginated-api-iterator/) which makes me think we may want to consider the following:

func example(ctx context.Context, q *Queries) {
	for city, err := range rows.IterCities(ctx) {
		// do something with city
	}
}

The upside to this design is that it's difficult to use incorrectly. We don't have to decide on a name for the Iter function, since it's removed. You also can't forget to call Close() on the underlying rows resource.

kyleconroy avatar Dec 20 '24 20:12 kyleconroy

The problem with using Seq2 that returns err is that if the code breaks out of the loop, any error returned from the call r.Close() is lost as there is no way to get it

gbarr avatar Dec 20 '24 20:12 gbarr

Just to add my 2cts to the discussion: it seems weird to me to have the error, and have to check it, in every loop iteration. That’s an advantage of the initial proposal.

Since we can start the query in the first iteration, and break inside the loop, we don’t need Close. We can use Err() to determine if the query or any of the row fetches failed.

I personally prefer:

func example(ctx context.Context, q *Queries) error {
	cities := q.IterCities(ctx)
	for city := range cities.Rows() {	
		// do something with city
	}
	return cities.Err()
}

Over:

func example(ctx context.Context, q *Queries) error {
	for city, err := range q.IterCities(ctx) {
		if err != nil {
			return err
		}
		
		// do something with city
	}
}

Simply because of the lack of err clutter inside the loop.

This does assume that:

  • when you call IterCities(), we don’t perform the query yet, so just in case you never call Iterate() we don’t have a dangling query. (Also the added benefit that IterCities doesn’t return an err itself, which is something pgx regrets in their API.)
  • Iterate() performs the query and starts streaming the results. If anything fails, it sets the Err() result and returns. After the last row we close the query, set the Err() result to nil and return.
  • If the loop body breaks (the yield func returns false) we close the query and set Err() to the result of that close operation, then return.

sgielen avatar Dec 22 '24 09:12 sgielen

After an error happens in the iterator, if it is never possible to continue it would make more sense to use iter.Seq[T]. However if there is a possibility that the user can safely continue on some errors then it would make more sense to use iter.Seq2[T, error]. Is there ever a case when the user could potentially continue streaming rows from the server after an error? Maybe some kind of DB driver data parse error could be ignored?

Also, would it make sense to stop iterating and then continue iterating later? I'm thinking something like a CURSOR, where I have some process that reads in 100 rows, then does something else, then reads another 100, and so on.

PatrLind avatar Feb 13 '25 11:02 PatrLind

I think we should instead have a function that returns:

  • a simple iterator
  • a function that returns an error

Code:

func example(ctx context.Context, q *Queries) error {
	cities, getCitiesErr := q.IterCities(ctx)
	for city := range cities {	
		// do something with city
	}
	return getCitiesErr()
}

q.IterCities() should not initiate any query to the DB. When the iterator starts to iterate, it should do the query to the DB, and start to yield the objects. If the yield function returns false (the loop is interrupted), it should release all DB resources. If an error occurs in the iterator, it should stop iterating, and the "error getter" function should return the error.

I prefer this design, because it solves several problems:

  • If the iterator is never iterated, no query is done
  • If the iterator is interrupted, it release automatically all resources
  • The "error getter" function must be assigned to a variable, so it forces the user to handle it. (Nothing forces the user to call cities.Err() in the previous code)

pierrre avatar Apr 03 '25 15:04 pierrre

@viewsharp did you close this PR intentionally ?

pierrre avatar Aug 08 '25 12:08 pierrre

I apologize. I wanted to sync my repository and didn't know that it could lead to closing the PR. I will restore it soon

viewsharp avatar Aug 08 '25 14:08 viewsharp