corrade icon indicating copy to clipboard operation
corrade copied to clipboard

constexpr StringView without literals

Open jdumas opened this issue 4 years ago • 9 comments

Hi. I noticed that a constexpr Containers::StringView can only be created from const char * via a literal _s. Reading the doc this seems to be a deliberate design decision, but I'm not entirely sure why? E.g., the tag Global is documented as:

The referenced string is global, i.e., with an unlimited lifetime. Enabling this flag can avoid needless allocations and copies in cases where the consumer needs to extend lifetime of passed string view.

But why would you want to extend the lifetime of a passed string view? This is a view, not a smart pointer no?

jdumas avatar Oct 07 '21 01:10 jdumas

There's also constexpr StringView(const char*, size_t) you could use if you know the literal length (e.g. const char x[] = "x" -> sizeof(x) - 1). I assume the issue with normal const char* literals is that there's no guaranteed constexpr equivalent of strlen before C++17.

The Global tag is a way to reason about the lifetime of the view's data pointer. If you need to, for example, keep the content of a view passed to a function around for later, you can avoid an owned copy if that tag is set. See for example String::nullTerminatedGlobalView which does exactly that.

pezcode avatar Oct 07 '21 09:10 pezcode

The main reason is std::strlen() not being constexpr, yes. However, assuming a possibility to have a constexpr std::strlen(), I'm afraid we'd either have to pick between being fast at runtime (by having a possibility to use 128/256-bit SSE4.2 / AVX instructions) or being executable at compile time (using a dumb slow loop in C++14, or worse, using a nasty recursion in C++11). Or we'd have to wait until C++20 and consteval, which I suppose could allow us to pick a different implementation for compile time and runtime. Not great, and something I'm disappointed about as well.

OTOH, as @pezcode says, it's possible to get away with using the char array length for size, although it may give different results than std::strlen() if the literal contains '\0' bytes. There's a conversion constructor from an ArrayView which could be abused to do this in a single expression, although it's currently not constexpr because I didn't want to add an implicit dependency between the StringView and ArrayView headers. But this use case is compelling enough that I'll try to make that work.

It'd then be something like the following, it's very explicit but since an arbitrary char[] can also be neither null-terminated nor a global constant, I'm afraid that adding a more convenient API for this could lead to error-prone code -- and since eventually a lot of Corrade APIs will depend on the NullTerminated flag (such as file opening, where a null-terminated string view is passed to std::fopen() as-is and a null-terminated copy is allocated otherwise), I don't want to introduce any API that could implicitly mark things as null terminated or global while they may be not.

// will be constexpr once I make that possible
constexpr Containers::StringView hello{Containers::arrayView("hello world!!").except(1), // strips the \0
    // you're responsible for ensuring these are correct
    Containers::StringViewFlag::Global|Containers::StringViewFlag::NullTerminated};

About the Global tag, it's unrelated to the constexpr, and it's set in the literal because the literal is the only place where it's clear at compile time that the string lives in a global constant memory and is null-terminated. The documentation for it is misleading, sorry about that. It was phrased in a way that suggested the flag somehow affects how the memory gets managed, wherereas it merely describes if the memory is a global constant or not. I'll change it to the following, which I hope explains the intent better:

The referenced string is global, i.e., with an unlimited lifetime. A string view with this flag set doesn't need to have a copy allocated in order to ensure it stays in scope.

mosra avatar Oct 07 '21 10:10 mosra

How does std::string_view achieves this then? Does it perform a slow loop at compile time to find the length of the char *? Or it uses a constexpr version of strlen()? The following code:

    constexpr std::string_view a = "foo\0bar";
    std::string_view b = "foo\0bar";
    std::cout << a.size() << " " << b.size() << std::endl;

outputs 3 3. Maybe we could make that possible when compiling with C++17.

Thanks for the clarification on the Global tag.

jdumas avatar Oct 07 '21 15:10 jdumas

How does std::string_view achieves this then?

Hmm. Using std::char_traits<T>::length(), which needs a good amount of compiler magic, apparently. This is the implementation in libstdc++ 11:

      static _GLIBCXX17_CONSTEXPR size_t
      length(const char_type* __s)
      {
#if __cplusplus >= 201703L
	if (__constant_string_p(__s))
	  return __gnu_cxx::char_traits<char_type>::length(__s); // <-- and this is the dump loop, yes
#endif
	return __builtin_strlen(__s);
      }

(Sigh.) For obvious reasons I don't want to depend on the <string> header for a string replacement to make use of std::char_traits, so I'm not really sure what to do here except for relying on compile-time-array length, or waiting for consteval that could make this possible even without compiler magic.

mosra avatar Oct 07 '21 15:10 mosra

Why do you need consteval? We could reimplement a constrexpr strlen using a simple loop no? E.g.

constexpr std::size_t length(const char* start) {
   const char* end = start;
   for(; *end != '\0'; ++end)
      ;
   return end - start;
}

constexpr size_t s = length("foo\0bar");

jdumas avatar Oct 07 '21 16:10 jdumas

Because, as I said above, I don't want to give up on a possibility to use 128/256-bit SSE4.2 / AVX instructions at runtime instead of always iterating on a single byte at a time and then hoping the compiler optimizes that somehow. That's what std::strlen() / __builtin_strlen() does.

mosra avatar Oct 07 '21 16:10 mosra

Got it. Would be nice if we could dispatch code based on whether the arg is constexpr or not...

jdumas avatar Oct 07 '21 16:10 jdumas

Yeah that's what consteval from C++20 does: https://en.cppreference.com/w/cpp/types/is_constant_evaluated

mosra avatar Oct 07 '21 16:10 mosra

A constexpr creation of a StringView from an ArrayView mentioned above got implemented in 81b708905069121b056f41fed37a6ec08fc12b12, so the following is now possible, at least, although I'm not sure if it's any better than writing a helper that consumes char(&)[n] and passes that to the already-constexpr StringView constructor taking a pointer + size:

constexpr Containers::StringView a = Containers::arrayView("hello").exceptSuffix(1);

Everything else, related to constexpr-aware construction from a char*, needs something like #152, and even then I'm not sure if it's a good idea given that it needs something like a compile-time while loop that iterates over the array until the first '\0' is found to be really compatible with what strlen() does. Worth investigating is also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86590 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80265 where the GCC devs deal with poor codegen related to doing this very operation.

mosra avatar Dec 29 '22 14:12 mosra