flatbuffers icon indicating copy to clipboard operation
flatbuffers copied to clipboard

[C#] Add allocation-free GetByKey implementation

Open CTVjweiss opened this issue 2 years ago • 12 comments

This is a fix for https://github.com/google/flatbuffers/issues/7288

The current C# implementation of GetByKey() had no option for avoiding a byte[] allocation from UTF8.GetBytes(string).

This adds an optional byte[] parameter that will be used as the buffer for UTF8 encoding the string if passed. The optionality of the parameter preserves backwards compatibility with existing code, but gives users the option to pass in their own scratch buffer and avoid the allocation. There is no safety checking to make sure the byte[] is long enough. If an array that is too small is passed, you'll get an exception thrown:

ArgumentException: The output byte buffer is too small to contain the encoded data, encoding 'Unicode (UTF-8)' fallback 'System.Text.EncoderReplacementFallback'.

This only affects tables where the key is a String.

CTVjweiss avatar May 04 '22 20:05 CTVjweiss

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

For more information, open the CLA check for this pull request.

google-cla[bot] avatar May 04 '22 20:05 google-cla[bot]

@CTVjweiss Still interested in getting this in? I left a few comments previously.

dbaileychess avatar Aug 06 '22 05:08 dbaileychess

Sorry I missed the notification for your initial review -- I am still interested! I'll get those changes done

CTVjweiss avatar Aug 11 '22 13:08 CTVjweiss

Okay made the changes, rebased, should be ready to go now. Sorry for the delay!

CTVjweiss avatar Aug 11 '22 14:08 CTVjweiss

@CTVjweiss Good to go, I think you just have to rebase.

dbaileychess avatar Aug 13 '22 05:08 dbaileychess

@CTVjweiss ping, can you do the merge? I cannot do if from the web UI.

dbaileychess avatar Aug 16 '22 17:08 dbaileychess

I can't merge these changes anymore, given the way lookup by key was reworked in another PR (https://github.com/google/flatbuffers/pull/7441).

The rework in the other PR actually makes the problem I'm trying to fix worse for string keys. Before each call to GetByKey was causing 1 string allocation for the passed in key. With the rework, each call is doing log(n) string allocations (a string allocation for each comparison).

I think my best path here if I want to avoid allocations without rolling back the rework from the other PR, would be to re-add converting the string key to bytes, and then have comparisons done on the byte accessor for the string key field. e.g. for Monster comparison it would be using GetNameBytes() instead of just Name, which allocates a string. I'll try that unless you think it's a bad idea.

CTVjweiss avatar Aug 18 '22 14:08 CTVjweiss

Thanks, I get what you mean about reinterpreting as a string. Is there a stringview thing in C# we can use instead that doesn't do the copy? Is that what you were suggesting with GetNameBytes()?

I just want to try to use the public interface of the generated type to get the value, instead of doing buffer snooping, as that has its issues with defaults-arent-written-by-default.

dbaileychess avatar Aug 18 '22 16:08 dbaileychess

Yeah that's what I'm suggesting with GetXXXXBytes() -- that returns a Span<T> which is the closest thing C# has to a stringview I think. But Span<T> doesn't have a CompareTo() function, checking it is essentially just checking the underlying byte[]. Going to have to spend a little more time thinking about this and understanding the defaults not written by default issue. In my use case the string key is the primary identifier of the data item and is always unique and required, so I'm fine on the old version I have forked for now. I would eventually like to contribute to flatbuffers though, I'm loving it.

CTVjweiss avatar Aug 23 '22 00:08 CTVjweiss

Yeah, since we don't have defaulted strings yet (@CasperN), you will always have the string key in the buffer.

How about we add generated method. Get<XXX>AsBytes() for whenever we have a string key field? And then we can provide both a CompareTo method and updated the lookup code to use this new generated method? That way string and other types have near identical code.

We do the _As in other languages so i think it would be useful to generate as a public API, instead of hiding it in the details of the lookup.

dbaileychess avatar Aug 26 '22 04:08 dbaileychess

Sorry for letting this sit so long, I think your suggested plan does make sense. I'm going to try to get back around to this this month if possible, as I don't want my project to be forever stuck on an old fork of flatbuffers.

CTVjweiss avatar Nov 03 '22 13:11 CTVjweiss

Converting to a Draft for now, until you get back to it.

dbaileychess avatar Nov 11 '22 02:11 dbaileychess