soci icon indicating copy to clipboard operation
soci copied to clipboard

Fetching all rows - why so difficult?

Open ghost opened this issue 11 years ago • 12 comments

Does fetching all rows really has to be so difficult? The docs state what the idiom for fetching all rows is. Is it possible to encapsulate it in a SOCI function?

Ideally, this would look like this:

const int BATCH_SIZE = 30;
std::vector<int> valsOut(BATCH_SIZE); // 0 would be even more convenient
statement st = (sql.prepare << "select value from numbers",
    into(valsOut));
st.execute();
st.fetch_all();

Same would be nice for the upcoming ORM bulk selects.

For zero-sized vector, this could be even easier:

std::vector<int> valsOut;
sql << "select value from numbers", into(valsOut);

If the performance is a problem, there might be an opt-in switch for manual batch sizing:

std::vector<int> valsOut;
sql << "select value from numbers", into(valsOut), batch(100);

This would be really so much better to use, especially when there are multiple intos into multiple vectors.

ghost avatar Sep 27 '13 10:09 ghost

+1, wish the soci can resize the vector automatically

rny avatar Sep 05 '14 05:09 rny

If soci does not resize vector this is mainly for performance. When you do a select there is no way to know how many rows are returned during the select statement, and the number of rows may be huge some times and you have to handle them by batch.

And if soci append a lot rows to a vector this can be very slow because a vector is not done for this kind of operation. It will often reallocate memory and move a huge memory area, often using copy constructor for each underlying object)

If soci use a list there is a lot of memory lost due to list overhead (next, previous pointers and dynamic memory allocation overhead)

Maybe the best is to allow usage of deque objects that combine vector and list logic.

ArnaudD-FR avatar Sep 05 '14 08:09 ArnaudD-FR

vector resize is a straw man argument. It is optimised for dealing with trivial-move types and the standard mandates exponential grows. So yes, I am fully in favour of just passing in a vector.

jsonn avatar Sep 05 '14 08:09 jsonn

@jsonn I know how vector is working and I know that vector has exponential grows, thanks. But it also means that each time vector has to grow, it may have to move more and more memory. And move is a big lost of time.

Maybe you think this is straw man argument, but you are not the center of the world and others may be looking for more than good enough performances ;-)

Now, back to the initial case, the best I see is:

  • if vector size is 0, it means that no batch size is specfied so put everything the query return in this vector and let the container manage the size.
  • if vector size is not 0, keep previous behavior.

I don't think the "batch" hint in the suggested example is very usefull if we keep backward compatibility using vector size.

std::vector<int> valsOut;
sql << "select value from numbers", into(valsOut), batch(100);

But if we set the container size 0 to allow unlimited fetch, maybe SOCI could also provide a way for user to specify the container he want to use. In some case where soci statement has to retrieve a lot of complex objects (a struct with many fields for example) in one shot, vector might not be the best container, maybe user would like to use a list, deque, ... container.

ArnaudD-FR avatar Sep 05 '14 09:09 ArnaudD-FR

It's a straw man because it's quite likely that the user already has to do exactly that, just with a fetch loop afterwards. One option to support it without changing the behavior of the interface for existing code would be to allow providing an output_iterator. That has the advantage of allowing the user whatever container can provide a matching iterator.

jsonn avatar Sep 05 '14 11:09 jsonn

Philosophically, iterator is indeed the way to go: you don't need a container, just something to write the results to. In practice, I'd be for having a low level function taking an output_iterator and a convenient wrapper on top allowing to simply write into(vector) because this is how 99% of people would expect it to work.

vadz avatar Sep 05 '14 11:09 vadz

I never like the principe of output iterator because you have to know the size of your final object in advance and you have to give a begin and final iterator to the function. I know no container iterator that allow me to add an unknown count of object to it. Maybe there is one in std that do automatically a push_back and return a reference to the new item?

Using output iterator you still need the concept of "batch" and I don't believe this is what @stefanchrobot was thinking about (please tell me if I'm wrong). This issue is more to avoid the concept of "batch" and allow soci to put all return statement rows inside a container without taking care of return row count.

ArnaudD-FR avatar Sep 05 '14 11:09 ArnaudD-FR

Are you aware of std::back_inserter?

vadz avatar Sep 05 '14 11:09 vadz

@vadz Thanks for the link. I was sure there was one to do it but as I never required it for the moment I never look for it....

As I understand this one does not implement operator== nor operator!=

So if we continue on output iterators, there is 2 use cases to handle.

  1. batch mode with a first and last output iterators
  2. non batch mode with an output iterator

To keep backward compatibility with vector SOCI have to check if it is empty and switch between use case 1 or 2

ArnaudD-FR avatar Sep 05 '14 12:09 ArnaudD-FR

the reason i wish a auto resize is sometimes I need to fetch unknown number of rows.

For example, a table of student name => lessons. Most students usually take 2 - 4 lessons, but freaks can take at most 10. I can limit the max rows returned by SQL "LIMIT 10", but I have to allocate a vector of 10 lines and it always got much fewer lines.

rny avatar Sep 09 '14 08:09 rny

soci::rowsetsoci::row is a better solution.

rny avatar Nov 28 '14 05:11 rny

This issue started about ten years ago, but still open now. I think an ORM style way could be more compact and simple, see the discussion here:

ORM: the query result rowset becomes invalid after a for loop · Issue #1064 · SOCI/soci

asmwarrior avatar Aug 22 '23 02:08 asmwarrior