libpqxx
libpqxx copied to clipboard
Iterator performance
In C++ iterators are presumed to be lightweight objects and are therefore commonly passed by value.
However, the internal implementation of pqxx iterators makes them not so lightweight:
pqxx::const_row_iterator
inherits from pqxx::field
.
pqxx::field
holds a pqxx::result
by value
pqxx::result
holds 2 std::shared_ptr
s.
The problem is that an std::shared_ptr
has to increment/decrement its counter atomically, which has a performance hit.
A quick test showed that passing an object that included two std::shared_ptr
s by value was about 15 times slower than passing it by reference in a debug build, and 50 times slower in a release build.
Can something be done to make copying iterators faster?
I have not compared performance, but perhaps the stream_query_input_iterator
(https://github.com/jtv/libpqxx/blob/7.8.1/include/pqxx/internal/stream_query_impl.hxx) used by the Stream API is more lightweight.
See the test for how to use the Stream API, such as tx.stream("SELECT ...")
.
https://github.com/jtv/libpqxx/blob/7.8.1/test/unit/test_stream_query.cxx
Food for thought.
Originally the result
& row
iterators did not have the shared_ptr
. That was a long time ago.
IIRC I added it so that a function like exec1()
could return a row
(and other similar use-cases).
Perhaps we could break the inheritance relationships between row
, field
, and these iterators. That way row
and field
can hold shared_ptr
but the iterators would work with regular pointers.
It could be a bit of an undertaking though. And technically it would be a breaking change. We're in the run-up to 8.0, so it's the perfect time for that really.
I don't think you need to make it "safer" than normal STL iterators. An iterator into std::vector
is basically a pointer under the hood, and will become dangling if the underlying container goes away. That seems like a reasonable approach to take.
Indeed. The catch though is that row
and field
do need the shared_ptr
s. And the iterator classes are derived from those classes.
That wasn't a great design choice, but it avoided an overwhelming amount of extra work and duplicated code. I'll have to see whether I can break those inheritance relationships. It looks like a serious amount of work, since the C++ iterator API is pretty broad. It may also break some client code out there.
I've been looking into this, and I'm digging up other complications:
- A result iterator is also a container. This creates conflicts in how it should define some of its nested types!
- In
row
andfield
I still need a fullresult
(which contains ashared_ptr
) so that a query function can return arow
orfield
without it becoming an instant dangling reference. - Someone may want conversions from a result iterator to a
row
, and from a row iterator to afield
.
So I'm still struggling with this. We may end up with a different division of labour between classes.
Yet another complication is that it's not immediately clear when to return a lightweight iterator and when to return a full row
or field
. It quickly becomes non-obvious, which means it's error-prone, so for now I'm giving up on the idea of removing the shared_ptr
from the iterator classes.
I'm not actually sure that's a big problem for iterators, because you don't actually need to copy them very often, so long as you iterate sensibly and pass references where appropriate. Really it's random access with operator[]
that's costly. It gets better in C++23 with multi-dimensional operator[]
, but it's still pretty costly. We can optimise the loop in pqxx::result::for_each()
, but if that's what you want you're probably better off streaming the query.
I'm forming some answers to the questions. For example, I'm now leaning towards making all iteration and indexing return iterators, not pqxx::row
or pqxx::field
. If you wanted to keep the result set in memory, you'd have to construct a row
or field
yourself. Everything would then be fast by default, with the option to get the heavier, "safer" objects.
Unfortunately all the work I did on this is on the laptop that broke today, so I'm kind of stuck for now. :-( One fun little side quest was to replace most code in the reverse_iterator
types for result
and row
with std::reverse_iterator
.
Will it break backward compatibility?
Well yes, of course - the change you asked for breaks compatibility.
I've got my laptop back, but unfortunately there was no chance to make a backup. There was a lot of incomplete work on these iterators on there. :-(
There has been no activity on this ticket. Consider closing it.
There has been no activity on this ticket. Consider closing it.
I think this might be a good thing to put on the list for libpqxx 8.0, and combine it with at least the preparation for #846.
It's been ages. But I've had some more time to think about it, and I think in libpqxx 8.0 we can go for a much stronger separation between row
/field
on the one hand, and result::iterator
/row::iterator
on the other.
The row
and field
classes will still have a shared_ptr
to the result
, and so keep it alive in memory, but the iterators will not.
When I experimented with this, I found that it didn't affect a lot of test code. Which I hope is a good sign. And... C++ compilers have advanced "analyzers" now which help find bugs that this might cause.