libpqxx icon indicating copy to clipboard operation
libpqxx copied to clipboard

Iterator performance

Open alexolog opened this issue 1 year ago • 15 comments

In C++ iterators are presumed to be lightweight objects and are therefore commonly passed by value.

However, the internal implementation of pqxx iterators makes them not so lightweight:

pqxx::const_row_iterator inherits from pqxx::field. pqxx::field holds a pqxx::result by value pqxx::result holds 2 std::shared_ptrs.

The problem is that an std::shared_ptr has to increment/decrement its counter atomically, which has a performance hit.

A quick test showed that passing an object that included two std::shared_ptrs by value was about 15 times slower than passing it by reference in a debug build, and 50 times slower in a release build.

Can something be done to make copying iterators faster?

alexolog avatar Dec 16 '23 05:12 alexolog

I have not compared performance, but perhaps the stream_query_input_iterator (https://github.com/jtv/libpqxx/blob/7.8.1/include/pqxx/internal/stream_query_impl.hxx) used by the Stream API is more lightweight.

See the test for how to use the Stream API, such as tx.stream("SELECT ..."). https://github.com/jtv/libpqxx/blob/7.8.1/test/unit/test_stream_query.cxx

tt4g avatar Dec 16 '23 06:12 tt4g

Food for thought.

Originally the result & row iterators did not have the shared_ptr. That was a long time ago.

IIRC I added it so that a function like exec1() could return a row (and other similar use-cases).

Perhaps we could break the inheritance relationships between row, field, and these iterators. That way row and field can hold shared_ptr but the iterators would work with regular pointers.

It could be a bit of an undertaking though. And technically it would be a breaking change. We're in the run-up to 8.0, so it's the perfect time for that really.

jtv avatar Dec 16 '23 10:12 jtv

I don't think you need to make it "safer" than normal STL iterators. An iterator into std::vector is basically a pointer under the hood, and will become dangling if the underlying container goes away. That seems like a reasonable approach to take.

alexolog avatar Dec 16 '23 17:12 alexolog

Indeed. The catch though is that row and field do need the shared_ptrs. And the iterator classes are derived from those classes.

That wasn't a great design choice, but it avoided an overwhelming amount of extra work and duplicated code. I'll have to see whether I can break those inheritance relationships. It looks like a serious amount of work, since the C++ iterator API is pretty broad. It may also break some client code out there.

jtv avatar Dec 17 '23 15:12 jtv

I've been looking into this, and I'm digging up other complications:

  • A result iterator is also a container. This creates conflicts in how it should define some of its nested types!
  • In row and field I still need a full result (which contains a shared_ptr) so that a query function can return a row or field without it becoming an instant dangling reference.
  • Someone may want conversions from a result iterator to a row, and from a row iterator to a field.

So I'm still struggling with this. We may end up with a different division of labour between classes.

jtv avatar Dec 20 '23 22:12 jtv

Yet another complication is that it's not immediately clear when to return a lightweight iterator and when to return a full row or field. It quickly becomes non-obvious, which means it's error-prone, so for now I'm giving up on the idea of removing the shared_ptr from the iterator classes.

I'm not actually sure that's a big problem for iterators, because you don't actually need to copy them very often, so long as you iterate sensibly and pass references where appropriate. Really it's random access with operator[] that's costly. It gets better in C++23 with multi-dimensional operator[], but it's still pretty costly. We can optimise the loop in pqxx::result::for_each(), but if that's what you want you're probably better off streaming the query.

jtv avatar Dec 23 '23 14:12 jtv

I'm forming some answers to the questions. For example, I'm now leaning towards making all iteration and indexing return iterators, not pqxx::row or pqxx::field. If you wanted to keep the result set in memory, you'd have to construct a row or field yourself. Everything would then be fast by default, with the option to get the heavier, "safer" objects.

Unfortunately all the work I did on this is on the laptop that broke today, so I'm kind of stuck for now. :-( One fun little side quest was to replace most code in the reverse_iterator types for result and row with std::reverse_iterator.

jtv avatar Dec 29 '23 22:12 jtv

Will it break backward compatibility?

alexolog avatar Dec 29 '23 22:12 alexolog

Well yes, of course - the change you asked for breaks compatibility.

jtv avatar Dec 29 '23 23:12 jtv

I've got my laptop back, but unfortunately there was no chance to make a backup. There was a lot of incomplete work on these iterators on there. :-(

jtv avatar Jan 19 '24 23:01 jtv

There has been no activity on this ticket. Consider closing it.

github-actions[bot] avatar Mar 20 '24 01:03 github-actions[bot]

There has been no activity on this ticket. Consider closing it.

github-actions[bot] avatar May 25 '24 02:05 github-actions[bot]

I think this might be a good thing to put on the list for libpqxx 8.0, and combine it with at least the preparation for #846.

jtv avatar Jun 16 '24 17:06 jtv

It's been ages. But I've had some more time to think about it, and I think in libpqxx 8.0 we can go for a much stronger separation between row/field on the one hand, and result::iterator/row::iterator on the other.

The row and field classes will still have a shared_ptr to the result, and so keep it alive in memory, but the iterators will not.

When I experimented with this, I found that it didn't affect a lot of test code. Which I hope is a good sign. And... C++ compilers have advanced "analyzers" now which help find bugs that this might cause.

jtv avatar Aug 15 '24 13:08 jtv