Remove `TrustedLen`
It may be possible for us to completely remove the TrustedLen trait, as the only place we really need it is in vortex-buffer.
By changing our from_iter and extend implementations on BufferMut to do this (instead of only relying on the lower bound):
// Since we do not know the length of the iterator, we can only guess how much memory we
// need to reserve. Note that these hints may be inaccurate.
let (lower_bound, upper_bound_opt) = iter.size_hint();
self.reserve(upper_bound_opt.unwrap_or(lower_bound));
we can deduplicate a lot of the code for this while incurring only 1 extra branch outside of the hot loop. More specifically, if you look at the difference between extend_iter and extend_trusted in https://github.com/vortex-data/vortex/pull/5055, there would only be a single extra branch after the hot loop that checks if there is any more items that need to be yielded.
In the worst case, that extra branch messes up all of the optimizations. In the best case, that single extra branch doesn't make much of a performance difference in practice.
CC @gatesn @robert3005 @a10y
The logic is duplicated right now because our untrusted logic is buggy, in practice it will leak memory in case of a panic from the Iterator::next call. If we fix that bug then there would be a significant difference in both of these.
Why would it leak memory? It may lose elements from the iterator because len isn't updated, but like... 🤷
It's not a leak you're right, I misread the comments in the stadard library. The reason why I don't think we should remove it is that standard library doesn't remove it and have both of them exist at the same time which means there's potential performance to be gained in either case. In Vec's case the Drop impl is dependent on length of the vector so you have to update length on every iteartion (which we do not do) then if you do that you run into https://github.com/rust-lang/rust/issues/32155 which can be avoided in TrustedLen case because we don't have to reserve multiple times, though maybe standard library impl could be better? So this all depends on how the underlying Drop implementation works
Right, but since our drop is handled by tokio Bytes, it's not dependent on the buffer's own length, then we don't care.
I think we can remove TrustedLen.
The reason we added it was didn't have the trick of consuming the iterator up to the reserved capacity using unchecked appends, then switch to using checked.
Wait no, I still don't buy your interpretation of the docs.
Deallocation doesn't depend on length, that doesn't make sense. If anything, capacity.
I’m just reading source code for Vec which this logic is based on
I think I understand now. We do have a bug in both of our implementations. However, fixing that bug is not what differentiates trusted len vs regular. The actual reason for having a trusted len iterator trait is so you can trust your upper bound size hint. The size_hint api only says that it has to be a lower and upper bound pair and if you don’t have a good upper bound leave it empty. But it’s also valid to return something like u64::MAX for upper bound for any iterator. The difference in standard library is that they don’t even look at upper bound in regular case.
I think if standard library doesn’t look at upper bound we probably shouldn’t as well as it’s likely exposing us to bad implementations that we’d otherwise avoid.
We also likely need to do the SetLenOnDrop thing to avoid aliasing issues that the issue I linked talks about
Does that mean we should also be ignoring the upper bound for normal iterators? This PR uses it and now I'm thinking I should remove it
Right, but since our drop is handled by tokio Bytes, it's not dependent on the buffer's own length, then we don't care.
This is true so long as for Buffer<T> the T does not have a drop impl. I think the reason why Vec needs to update the len incrementally is that it needs to be able to drop the elements that it's written.
If we assume that T is always some trivial type then we can skip it. If we can't assume that, then Rob is right and we need to do what Vec does.
I think if we want to be proper, we can use this const function and a static_assertion to make sure T is !Drop
https://doc.rust-lang.org/std/mem/fn.needs_drop.html