static_string
static_string copied to clipboard
Allow building an array_string from a const char*
This adds a public converting constructor that accepts a const char*, and modifies the existing private converting constructor to simply accept any parameter type (as long as it supports constexpr indexing, it should work).
By adding this, you can do more complicated string processing in constexpr functions, and then store the result in a new char array using this new constructor. This allows, for example, to implement a compile-time basename function (see my blog for an example: https://www.stderr.nl/Blog/Software/Cpp/CompiletimeBasename.html).
I am not comfortable providing a constructor from a pointer. Instead, I have added function offset_literal, which should allow you to acheive the same goal. Here is an illustration: https://wandbox.org/permlink/ztFuvdg4LbADSgdU
Would that be satisfactory for your use case?
I am not comfortable providing a constructor from a pointer.
Why not? In a constexpr context, AFAICS the compiler will perform static bounds checks so this should be safe?
What I don't like about your suggested solution is that:
- I need to repeat the string literal twice (though I guess that could be rewritten to store and pass a
string_literalinstead) - The return value of
compute_basename_startis no longer self-contained, but is only valid in a specific context tied to the string it operates on, and the operation that still needs to be performed (offseting).
What I liked about the const char* approach is that the basename and strrchr implementations are quite clean and return a self-contained result, and only at the end do some slightly less elegant template strlen stuff.
Ideally, you could do more kinds of processing (also replacing characters in a string, or more generic substring operations, etc.) in the same way, but I now see that the const char* doesn't quite allow this either (since you can't change characters inside a const char*, you would need to convert to an array_string, but you can only do that conversion when you actually know how long the string will be, which requires knowledge of the full starting string, which you can only do at the final end of the operation, not halfway relying on a string passed in as a parameter).
Carrying this thought further, it seems you would get more flexibility if you interpret the N not as the actual string size, but as the maximum size (in other words, if you allow NUL bytes inside the array_string buffer). Then you can write your string processing functions to pass around and return array_string objects, possibly stuffed with zeroes at the end. You can implement a substring and a "replace character" that simply return a new array_string and all kinds of other operations on top of these. At the top level, you can shrink the resulting string by calling strlen() on the result and passing that as the new N parameter, similar to the example in my code.
One thing you cannot really implement in this way is string operations that increase the string length in a way that is not bounded by type parameters (e.g. when the resulting length depends on string contents and a upper bound cannot be selected statically), but in these cases you can probably explicitly specify an upper bound, and raise a compile-time error when the bound is exceeded, which should probably be sufficient.
How does that sound?
I am not comfortable providing a constructor from a pointer.
Why not? In a constexpr context, AFAICS the compiler will perform static bounds checks so this should be safe?
When you pass me a pointer, it is just a pointer. I have no guarantee if it points to a null-terminated char sequence. I cannot apply some validations. Also, I cannot statically figure out the size.
What I don't like about your suggested solution is that:
- I need to repeat the string literal twice (though I guess that could be rewritten to store and pass a string_literal instead)
- The return value of compute_basename_start is no longer self-contained, but is only valid in a specific context tied to the string it operates on, and the operation that still needs to be performed (offseting).
I appreciate why this is a problem. Let me think about a satisfactory solution.
How does that sound? If you mean that I can add more ops, then yes, but I would rather add them selectively, as needs arise and are reported to me.
If you are referring to the limitations of this implementation, then yes, it is not a full static string processing: only the sizes of strings.
Maybe what you need is a C++17 string_view:
#include <experimental/string_view>
#include <iostream>
using std::experimental::string_view;
constexpr string_view s {"/user/me/project/file.cpp"};
constexpr auto i = s.find_last_of("/");
constexpr auto j = i == s.npos ? 0 : i + 1;
constexpr string_view fname = s.substr(j);
int main()
{
std::cout << fname << std::endl;
}
Here's a live example: https://wandbox.org/permlink/0kNenlD54cvtZVOf
string_view is certainly interesting. For my particular usecase (basename) it suffices, except that I would still need your static_string class to allocate a new array containing just file.cpp. The code you wrote above will allocate the full /user/me/project/file.cpp in the .elf file, and then refer to just a part of it (so you would save the runtime cost, but not the memory cost).
Furthermore, string_view only handles contiguous subsets of strings. For complete string processing, you would also need concatenation (I guess you can pretty much build anything else from those two), but concatenation does require allocating a new buffer as your array_string does now (or keep two pointers to the concatenated strings, but then each concatenating changes the type of the resulting string, which might be problematic when the concatenation is data-dependent).
I translated my ideas into some code, since that is often easier to talk about. See the last commit I pushed to this PR.
I essentially changed the meaning of the N template parameter, so it now means "The size of the storage", instead of "The length of the string". The length of the string can be smaller. In the case of a literal_ref, the offset and actual string length is stored explictly, in the case of the char_array, it is deduced using a static version of strlen. The actual string length is returned by len() in both cases.
I also added a substr() implementation for both string versions, which allows generating these "shorter-than-N" strings. Using this substr will generate strings that are shortened, but the storage they point to will still take up the full N bytes. To shrink this storage, the actual string length must be passed as a template parameter, so that can only happen at the outer level of processing, where the result is available in a constexpr variable (and specifically can not happen with strings that are available as arguments to constexpr functions, since those are not guaranteed to be constexpr). e.g. to shrink an array_string in the file variable:
constexpr auto small_file = sstr::array_string<file.len()>(file);
I also added the static_basename code to the test file, which can now be written in a fairly natural way (it would be even better in C++14).
Some remarks regarding the code:
- When concatenating or converting strings, a int sequence was used. Since the actual string length is now no longer a template parameter, we cannot generate an int sequence based on the actual string length. Instead, I'm still generating int sequences based on the full storage size (
N), but introduced aconcat_elemhelper that figures out how many elements to use from the first string, when to switch to the second string, and fills up the rest of the storage with zeroes. - For the converting constructor, I would need something similar (to use up to
len()bytes from the converted string and fill the rest with zeroes). I solved this quick and dirty by reusing the concatenating private constructor and concatenating a zero-length string. This probably needs to be solved more elegantly. - I also added a converting constructor for
array_stringthat can expand or shrink the storage sizeN. Expanding is needed when processing a string where you conditionally concatenate two strings: the resulting types will be different depending on wether the concatenation happened or not. By allowing the storage to expand, you can always return a value of the bigger type. Shrinking is needed to shrink the storage by passing an explicit length template parameter as shown above, and has an assert in place to make sure you won't shrink to smaller than the actual string length. - I did not add a converting constructor that changes
Nforliteral_ref, since that would further change the meaning ofNfrom "the size of the storage pointed to", to "An upper bound on the actual string length". Changing this might be ok, and is needed for some string processing (e.g. when choosing one of two strings with different lengths to return, you need to unify both into a single type that will hold both references), but I haven't done this yet. - When using
substr()on aliteral_ref, the length of the string is separately stored, but the NUL-termination is not changed. This means that when callingc_str()on such a string you will see the original NUL-terminator (i.e. any suffix removed usingsubstr()will be present again). I don't think this can be fixed, other than forcing thesubstr()return value into anarray_string(either directly, or letting the caller do that before callingc_str()). - I added a new
len()method, to return the actual string length, while keeping thesize()method to return the value of the storage (e.g. ofN). I'm not sure if having both is really meaningful, though. - I used concatenation with an empty string to quickly convert a
literal_refinto anarray_string, just for convenience. - Having to support both
literal_refandchar_arrayis a bit cumbersome. I now implemented two versions ofsubstr(), and mystatic_basenamealso uses templates to support both kinds of string, but it also needs adecltypefor the return type since the result ofsubstr()depends on the type it is called upon. For this kind of string processing, it is probably a lot more convenient to only use thearray_stringversion, which should work in all cases where theliteral_refalso works (and with a proper constructor, you can convert a string literal into anarray_stringdirectly). The downside is probably performance: Allocating a new array all the time can be more work than just passing around references, but since this is all at compiletime, that's probably not really a big issue. A related issue is that string literals can share storage, while thearray_stringstorage is probably not shared (unless the linker is really smart). However, once you do any string concatenation, you would lose sharing anyway, and when you're only doing substring-processing, you likely want to shrink the storage from the original anyway.
I guess the last point would be the biggest point to discuss. I think that to conveniently support generic string processing, the literal_ref string should be removed, but I can imagine you want to keep it in, especially in reference to your blogpost about it. Or do you have other compelling reasons to keep it?
Thank you for the submission. When I have some more spare time, I will analyze this in detail. I understand that the main idea is to track the "capacity" statically, and "length" dynamically.
@akrzemi1, did you perhaps find some time to look at this yet? No pressure, just a gentle reminder :-)
@matthijskooijman: I am sorry but I had to put it in queue due to other commitments and personal life. I did look at it. I think I know how to handle it (provide three specializations of the template: for references to static array, for copied array, and for copied array with fixed capacity but dynamic length). But in honesty, it will take me some time to do this. Sorry for the delay.
I have implemented part of your proposed change: you can take suffixes but not arbitrary substrings.
constexpr auto fpath = sstr::literal("path/fname");
constexpr auto fname = sstr::suffix(fpath, 5);
// decltype(fname) is sstr::string_literal_suffix<10>
// fname.size() == 5
constexpr auto full_fname = fname + ".txt";
// decltype(fname) is sstr::array_string_suffix<10 + 4>
// fname.size() == 9
std::cout << full_fname << std::endl;
// outputs: "fname.txt"
I think it should handle your use case?