arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[C#] `Column(string)` method in `RecordBatch` is linear to the number of columns

Open vthemelis opened this issue 1 year ago • 1 comments

Describe the enhancement requested

It looks like a column lookup by name is linear to the number of columns. This is not intuitive and can easily lead to performance regressions. Would it be possible to add a lookup to convert this into an O(1) operation?

Component(s)

C#

vthemelis avatar Oct 22 '24 15:10 vthemelis

Column names are not actually required to be unique, which is why Schema.Fields is marked deprecated. If you know that the column names in your data are unique, you could still use it. Otherwise, we'd need something like a mapping of a string onto what's possibly a list of field positions.

CurtHagenlocher avatar Oct 22 '24 15:10 CurtHagenlocher

Hi @CurtHagenlocher and thanks for you reply! Very interesting that the column names don't need to be unique. I don't mind so much about that. I personally just want the existing retrieval functions to be faster than they are for their most common use-cases.

I added #44633 to do exactly that. Note that I would like to also replace the existing Lookups with signature string -> Field but unfortunately those use StringComparer.Default instead of StringComparer.CurrentCulture. Not sure if this is intentional.

vthemelis avatar Nov 04 '24 14:11 vthemelis

This issue has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label or comment below, or this issue will be closed in 14 days. If this improvement is still desired but has no current owner, please add the 'Status: needs champion' label.

github-actions[bot] avatar Nov 18 '25 11:11 github-actions[bot]