STL icon indicating copy to clipboard operation
STL copied to clipboard

`<xstring>`: `basic_string` could handle insert aliasing without allocating in some common cases

Open CaseyCarter opened this issue 3 years ago • 4 comments

Notably:

  1. Given a range of contiguous iterators we can directly determine if it aliases the string contents.
  2. Given a sized or forward range that will fit in available capacity, we could append and then rotate (which can't throw) into place to effect inserts even in the presence of aliasing.
  3. For sizeable-but-won't-fit, we could allocate the new memory, insert the new range in place, and then move (which can't throw) the pre-existing characters.
  4. For non-sizeable, fill the tail, either rotating if the capacity suffices or shifting into the correct places when we do allocate a new block.

Note that these are roughly in order of ease of implementation and expected improvement. The more exotic the workaround, the less improvement we can expect over the dumb "make a temporary string and then merge" strategy.

(Inspired by discussion in https://github.com/microsoft/STL/pull/2806#discussion_r918399814.)

CaseyCarter avatar Jul 25 '22 23:07 CaseyCarter

For exotic performance techniques, we should use the new benchmarking infrastructure added by @strega-nil-ms to guide our decision as to whether the code complexity is worth the benefit. Not necessary for simple slam-dunk optimizations like Casey's (1).

Note that some or all of these might not work at constexpr time.

StephanTLavavej avatar Jul 26 '22 22:07 StephanTLavavej

This is also discussed in llvm/llvm-project#56031, except for cases of sized but not forward ranges, which may be treated like forward ranges. Note that in some cases it is possible to know that the source range can't be aliased with basic_string due to its iterator type.

I strongly believe that generally we should not consider "relocation before insertion" unless it clearly has some benefits (e.g. in the case of replace(_with_range) or where rotation can be easily avoided). And thus alias detection is not so worthwhile to me.

Note that some or all of these might not work at constexpr time.

I believe that only (1) is incompatible with constant evaluation. "Type-based alias analysis" (??!) works with constexpr.

frederick-vs-ja avatar Jul 27 '22 07:07 frederick-vs-ja

@frederick-vs-ja unfortunately, string is based on char, which likes to provide storage for other types :<

strega-nil-ms avatar Jul 27 '22 18:07 strega-nil-ms

@frederick-vs-ja unfortunately, string is based on char, which likes to provide storage for other types :<

Oh... this makes sense. I can imagine the case where elements of two std::strings overlap, and the case where std::vector<T, CustomAlloc<T>> and a std::string use the same storage.

But IIUC we can still make stronger assumptions when none of the element types is char, unsigned char, or byte...


Edit: CWG decided that char arrays are not able to provide storage for other objects in CWG-2489 (submitted by me). Although my original intent was that char arrays should be able to do so.

frederick-vs-ja avatar Jul 27 '22 23:07 frederick-vs-ja