imgui icon indicating copy to clipboard operation
imgui copied to clipboard

ImStrv: Use a string span for consuming strings in API

Open rokups opened this issue 4 years ago • 43 comments

EDIT Moved to features/string_view branch

WE EXPECT TO MERGE THIS IN 2021 BUT ARE WAITING FOR (1) NEW BINDING GENERATOR (2) INVESTIGATING SWITCHING TO STB_PRINTF IN ORDER TO SUPPORT NON-ZERO TERMINATED FORMATS

This PR implements a custom ImStr string span class and uses it in the API instead of const char* text and const char* text, const char* text_end. PR is a spiritual successor of #683.

Rationale: this change facilitates use of Dear ImGui from languages that do not zero-terminate their strings. So basically rust. Even though rust does have zero-terminated c strings - they are not default and therefore transparent interop between rust strings and this library would require string copying.

Things to note:

  • Variadic functions are not covered. Use of custom class breaks static format string analysis of GCC. Ideally these variadic functions should get non-variadic counterparts and user should do formatting on their end (like Text() vs TestUnformatted().
  • Most functions from imgui_internal.h that are not expected to be used by vast majority of users are not covered.
  • Some functions from imgui_internal.h that take const char* format are not covered for performance reasons.
  • Some functions taking arrays of const char* or callbacks are not covered.
  • Input*(), Sider*() and Scalar*() functions are covered, but that limits size of format parameter to 63 bytes. Format is copied to a buffer on stack because internal functions expect it to be zero-terminated.
  • Bunch of functions taking const char* text, const char* text_end were obsoleted, because ImStr version does exactly same.
  • ImStr is implicitly constructible from const char*.
  • ImStr is implicitly convertible to bool, yields false when ImStr has no string attached. Yields true if any string is attached, even if string length is 0. This is essentially same as checking if string pointer is null.

rokups avatar Feb 24 '20 09:02 rokups

This looks awesome, we also use IMGUI in Rust where we have string slices instead of null-terminated string pointers. And we also use IMGUI inside a WASM sandbox where we explicitly have to transfer the exact right memory between WASM and the host and where string slices with explicit length are a lot easier, faster, and safer than loose null-terminated string pointers where one have to dynamically determine the length of them for the marshalling.

So would love to see this integrated in to core IMGUI!

repi avatar Feb 26 '20 11:02 repi

Good to hear! Would you care to try this branch with your application? This PR may have issues or things missing. Real world experience would be invaluable for finishing this PR and merging it.

rokups avatar Feb 26 '20 11:02 rokups

Unfortunately it would be a bit tricky for us to try it I think because we use IMGUI through cimgui through imgui-rs which we have our own WASM fork of. So quite a few layers of indirection.

But @Jasper-Bekkers and @MaikKlein in our team here at Embark have more experience with how that works and may be able to give feedback about this PR

repi avatar Feb 26 '20 11:02 repi

No need to immediately act upon that in PR, but as C++20 is aiming at inconveniently making UTF-8 literal u8"some text" not be const char* anymore but const char8_t*, adding the extra constructor taking const char8_t* would be an added incentive for adopting this.

Another one is that it would facilitate a path where wchar_t could be supported as the default compile-time target, to support e.g. Unreal Engine without utf-8 > wchar_t conversions.

ocornut avatar Mar 01 '20 21:03 ocornut

@ocornut Is this PR something you'd consider merge in in the near future?

Jasper-Bekkers avatar Mar 25 '20 08:03 Jasper-Bekkers

Hello,

Not super “near” future as I don’t have resources to review/tweak it thoroughly the way I’d like (considering concerns such as cognitive and debug overhead for c++ users) but I imagine we should be able to tackle it in 2020, which is the reason I asked Rokas to revive the old PR.

Real world feedback would also be helpful.

If there’s traction, one possibility is to first merge the changed function signatures which constitute the bulk of the diff, so the remaining diff gets easier to update for as long as this is not fulled solved.

ocornut avatar Mar 25 '20 09:03 ocornut

Getting the updated function signatures in would already help us quite a bit. Having strings the way they are now has had quite an impact on imgui-s for us so investigating changes there would also take some time.

Jasper-Bekkers avatar Mar 29 '20 19:03 Jasper-Bekkers

Since the PR only mentions Rust, I just wanted to mention that this would also really benefit my usecase of using imgui from D.

simonvanbernem avatar Apr 04 '20 07:04 simonvanbernem

"Real world feedback would also be helpful." I did not go trough all the changes, but I think it can make imgui usable with C++17 and later which are now more and more used. I found this PR when I was considering if we can use imgui for our next project or not. So my question is if it will be merged any time soon, or we should seach for another library? Thanks in advance for the answer.

Kubikx avatar Apr 30 '20 12:04 Kubikx

This would also be helpful for the Java binding I maintained.

ice1000 avatar Nov 30 '20 10:11 ice1000

(moved message from #494, sane to resume talking in the newer issue)

I have a pushed an updated features/string_view branch https://github.com/ocornut/imgui/tree/features/string_view based over latest. Some changes over Rokups's recent is that I've enforced the end pointer to be valid if the begin pointer is valid. This allows us to remove late validation code, we now just expect that both pointers are set together by the constructor.

That gives the extra benefit that with the single-param constructor: literals which are frequent with raw C/C++ usage can now using compile-times strlen.

That's over the general benefit that pulling data from another language or string type which has constant time access to length can now avoid the strlen() alltogether - that's another benefit of this branch.

Some quick measurement done with our testing framework gives it a nice edge in Release build, and Debug build (with all VS security checks) gave it a minor 2-3% slow-down. This is before I made some other tweaks which perhaps made it better. Will do more thorough measurement later.

Things I would like to be able to move forward:

  • [ ] 1. general testing + thorough performance measurements (Rokups and I will work on that)
  • [ ] 2. looking at this from the pov public/internal api breaking changes. current version is designed to minimize those (to confirm?) but I believe some of the internal api would be saner to break. (Rokups and I can work on that, but any feedback welcome)
  • [ ] 3. reestablish a working proof of concept for Rust users and try to have 1-3 people toying with it even quickly.
  • [ ] 4. figure out how to integrate this with cimgui

Point (3) will need involved Rust users.

Point (4) is interesting to discuss: @bitshifter old cimgui fork replaced all cimgui api with ImStr: https://github.com/cimgui/cimgui/compare/master...bitshifter:imstr I do believe this would however be problematic for actual C user. We could have a compile-time option for cimgui to use const char* or ImStr but my gut feeling is that may be create hurdle with people pulling from package manager and not building themselves?

Otherwise cimgui could provide two api with a different prefix. One e.g. igButton() would take const char* + perform the strlen() itself, one e.g. igsButton() would take ImStr or simply two const char*. Since those functions are super thin wrapper I don't feel it would be problematic for main cimgui API. But this scheme would need to be adjusted for non ImGui:: functions a swell: ImDrawList_AddText, `

Either way this needs a solution. As a general feeling I believe that only direct C users would want to use const char* and every indirect users (LUA, C#, Rust, language bindings) would be better off with ImStr.


It's too late for 1.80 but tentatively if the right things are in place we could aim for it by 1.81 or 1.82. The branch being in place it would be ideal to start now working on nailing 3/4 so we have some real world feedback and then by the time we are done with 1.80 we can look at that.

ocornut avatar Nov 30 '20 10:11 ocornut

It has been a long time since I looked at this. It looks like cimgui is now autogenerated rather than hand written like it was when I forked it. So it will be interesting to see if that "just works" . On the Rust side of things, I think either a #[repr(C)] ImStr struct or a start and end const char* pointers could work, most likely it could be hidden in the imgui-rs FFI layer. I think the reason I went with ImStr was it was less changes than adding start and end pointer parameters everywhere and a bit easier to see where my changes were. At the time there were a few dear imgui methods that already took an end pointer which would have made it a bit harder to keep track of what I'd changed.

bitshifter avatar Dec 02 '20 02:12 bitshifter

As a general feeling I believe that only direct C users would want to use const char* and every indirect users (LUA, C#, Rust, language bindings) would be better off with ImStr.

In LuaJIT-ImGui I would prefer

ig.Text("some_text")

over

ig.Text(ImStr("some_text"))

sonoro1234 avatar Jan 28 '21 14:01 sonoro1234

In LuaJIT-ImGui I would prefer

Yes of course. From pretty much any language you would do Text("hello") and the bindings would convert that to the ImStr function implicitly/automatically or explicitly (preferable as it can leverage knowing string length without a strlen call).

ocornut avatar Jan 28 '21 14:01 ocornut

In cimgui ImStr would just be

typedef struct ImStr ImStr;
struct ImStr
{
    const char* b, *e;
};

In LuaJIT

local ffi = require"ffi"

-- C definition
ffi.cdef[[
typedef struct ImStr ImStr;
struct ImStr
{
    const char* b, *e;
};
]]

-- LuaJIT wrapping
local ImStr= {}
function ImStr.__new(ctype,a,b)
	b = b or ffi.new("const char*",a) + #a
	return ffi.new(ctype,a,b)
end
function ImStr.__tostring(is)
	return ffi.string(is.b,is.e~=nil and is.e-is.b or nil)
end
ImStr.__index = ImStr
ImStr = ffi.metatype("ImStr",ImStr)

-- use it
local is = ImStr("hello")
print(is,is.b,is.e)

would print hello cdata<const char *>: 0x002b7a1c cdata<const char *>: 0x002b7a21

And the burden would be in all cimgui functions taking one or more ImStr: When an ImStr typed argument is needed it will have to be wrapped inside ImStr()

now is

function M.TextUnformatted(text,text_end)
    text_end = text_end or nil
    return lib.igTextUnformatted(text,text_end)
end

after it will be

function M.TextUnformatted(text)
    return lib.igTextUnformatted(ImStr(text))
end

using cimgui from C could be even more difficult.

ImStr str;
str.b = "hello";
str.e = str.b + strlen(str.b);
igTextUnformatted(str);

sonoro1234 avatar Jan 28 '21 17:01 sonoro1234

ImStr str;
str.b = "hello";
str.e = str.b + strlen(str.b);
igTextUnformatted(str);

This strikes me as the wrong way to surface ImStr in cimgui.

I would expect cimgui to generate ImStr and non-ImStr versions. The average C user would use the non-ImStr versions and bindings from languages which naturally use ImStr-like string representation would use those.

For example, I'd expect something like this from cimgui for SmallButton:

CIMGUI_API bool igSmallButton(const char* label)
{
    return ImGui::SmallButton(ImStr(label));
}

CIMGUI_API bool igSmallButton_ImStr(ImStr label)
{
    return ImGui::SmallButton(label);
}

A Lua binding might not even surface the _ImStr variants, whereas a binding for .NET (where strings are length-prefixed) would likely use them under the hood exclusively.

PathogenDavid avatar Jan 29 '21 10:01 PathogenDavid

@PathogenDavid I agree 100% I opened a thread in cimgui repo to discuss this https://github.com/cimgui/cimgui/issues/175

ocornut avatar Jan 29 '21 10:01 ocornut

@ocornut @PathogenDavid I mostly agree. Only a few naming issues.

Take for example in imgui.h

    IMGUI_API void          PushID(ImStr str_id);                                          
    IMGUI_API void          PushID(const char* str_id_begin, const char* str_id_end = NULL);
    IMGUI_API void          PushID(const void* ptr_id);                                     
    IMGUI_API void          PushID(int int_id);                                             

That would be in cimgui.h

void igPushIDStr(ImStr str_id);
void igPushIDChptChpt(const char* str_id_begin,const char* str_id_end);
void igPushIDPtr(const void* ptr_id);
void igPushIDInt(int int_id);

and the overloaded function taking ImStr would trig generation of another function

void igPushIDChpt(const char *str_id)
{
    ImGui::PushID(str_id); //will use implicit conversion
}

That would behave as the old igPushIDStr

@rokups I think that this should be tested with the changes in this PR. I am not sure how this should be done: your brach https://github.com/rokups/imgui/tree/rk/ImStr is 4 commits ahead, 300 commits behind ocornut:master. Is this a problem?

sonoro1234 avatar Jan 29 '21 16:01 sonoro1234

The PR has been moved to this repository, features/string_view It is up to date

ocornut avatar Jan 29 '21 16:01 ocornut

Done pull imgui/features/string_view and generation after modification avoiding name clash in overloadings (char* was Str now is Chpt, ImStr is now Str) in https://github.com/cimgui/cimgui/tree/string_view

soon will work on making const char * versions

sonoro1234 avatar Jan 29 '21 18:01 sonoro1234

@ocornut Just finished and examples working but I have a question Is there any reason for using const in Button but not in SmallButton

    IMGUI_API bool          Button(const ImStr label, const ImVec2& size = ImVec2(0, 0));   // button
    IMGUI_API bool          SmallButton(ImStr label);                                       // button with

sonoro1234 avatar Jan 30 '21 18:01 sonoro1234

While working on LuaJIT wrapping I realized that ImStr to const char* susbtitution would clash with already present substitutions on imgui. So when the signature resulting from the substitution is already present in imgui the const char* version is not added now.

The place where this substitution is made has changed: now the new signature is added before computation of overloading names so that naming conventions have changed (updated in the message above).

sonoro1234 avatar Feb 01 '21 10:02 sonoro1234

Is there any reason for using const in Button but not in SmallButton

It's a bug! Will patch/repush the branch now.

Let's try to discuss the rest in https://github.com/cimgui/cimgui/issues/175

ocornut avatar Feb 01 '21 11:02 ocornut

Rebased this branch on master and renamed ImStr to ImStrv in history, so we have a basic type that convey "string view" (lack of ownership) better.

ocornut avatar Mar 04 '21 09:03 ocornut

FYI this is maintained as features/string_view branch and we are starting to push users to give it a try (@emoon is trying to write a new backend).

One thing that annoys me in our ImStrv branch is we are forced to copy and zero-terminate format-string because we are relying on printf functions. So currently functions like SliderScalar(), DragScalar() etc need to do that:

    const char* format;
    char format_buf[64];
    if (!format_p)
    {
        // Default format string when passing NULL
        format = DataTypeGetInfo(data_type)->PrintFmt;
    }
    else
    {
        // Copy format string (format may not be zero-terminated)
        format = format_buf;
        IM_ASSERT(format_p.End - format_p.Begin < IM_ARRAYSIZE(format_buf));
        ImStrncpy(format_buf, format_p, IM_ARRAYSIZE(format_buf));
    }

And Text(), TreeNode() etc don't yet support format string as non-zero terminated for the same reason. It's particularly annoying because many format strings would ALREADY be zero-terminated but we cannot assume it (and I don't think it would be legal to read at the *format.end address, probably wouldn't crash but would trigger asan, etc.)

So I think we'd need a printf implementation which support format as a non-zero terminated string-view.... May look into stb_printf..... It doesn't support it but we could envision making that change.

Any other idea?

ocornut avatar Jul 27 '21 11:07 ocornut

(Embedding our own printf implementation would also standardize localisation behaviors such as #2278)

ocornut avatar Jul 27 '21 11:07 ocornut

printf("%.*s", length, string); works (though will end the string at nulls inside the non-null-terminated string).

hrydgard avatar Jul 27 '21 11:07 hrydgard

@hrydgard I am talking about the format string itself not being null-terminated.

ocornut avatar Jul 27 '21 11:07 ocornut

printf("%.*s", length, string); works (though will end the string at nulls inside the non-null-terminated string).

This doesn't help: printing a string without termination is not the problem, having the Format-String itself not being terminated is the problem.

simonvanbernem avatar Jul 27 '21 11:07 simonvanbernem

Doh, right :/ Yeah, will have to go for a custom implementation indeed.

hrydgard avatar Jul 27 '21 11:07 hrydgard