crystal icon indicating copy to clipboard operation
crystal copied to clipboard

Regex#each_name(&) and Regex#names

Open Sija opened this issue 5 years ago • 16 comments

fwiw Regex#names exists in Ruby.

Regex#each_name was added as a helper and a way to access group names without additional allocations.

Sija avatar Sep 18 '20 14:09 Sija

Could this get a 2nd approval?

Sija avatar Dec 03 '20 14:12 Sija

These were never discussed before, sorry. I don't understand their usefulness so I won't approve them.

asterite avatar Dec 03 '20 18:12 asterite

2nd approval anyone? /cc @bcardiff @jhass

Sija avatar Jan 07 '21 01:01 Sija

This needs a rebase

straight-shoota avatar Jan 07 '21 01:01 straight-shoota

@straight-shoota done.

Sija avatar Jan 07 '21 01:01 Sija

Can you provide a use case?

Also, UInt16 should be replace with Int32.

Then, I don't understand why names is useful for, and why there needs to be an unsorted and sorted variants. This makes the API bigger but I can't see how it can be used for anything useful.

asterite avatar Jan 07 '21 09:01 asterite

Can you provide a use case?

@asterite These new methods are documented and I consider that enough for explaining the use cases.

Also, UInt16 should be replace with Int32.

That would be probably better, yes.

Then, I don't understand why names is useful for, and why there needs to be an unsorted and sorted variants. This makes the API bigger but I can't see how it can be used for anything useful.

Agreed, although unsorted version is there for efficiency reasons, yet if deemed unnecessary I can leave only sorted version.

Sija avatar Mar 15 '21 21:03 Sija

@asterite ping 👋

Sija avatar May 07 '21 20:05 Sija

Sorry, I don't have much more time to review or decide things.

asterite avatar May 07 '21 20:05 asterite

We can't just change the return type of #name_table from Hash(UInt16, String) to Hash(Int32, String). That would break the API. For now, we need to keep that method. There could be a different method returning Hash(Int32, String), though.

I've also some doubts if #names is a good addition. I see some risk that the behaviour cause confusion: The absence of unnamed capture groups might be missed. So I'm leaning towards not having this method. Is there a good use case for that, besides "Ruby has it"? I don't think it's much relevant and can easily be implemented when needed.

straight-shoota avatar May 09 '21 19:05 straight-shoota

We can't just change the return type of #name_table from Hash(UInt16, String) to Hash(Int32, String). That would break the API. For now, we need to keep that method. There could be a different method returning Hash(Int32, String), though.

https://github.com/search?q=name_table+language%3ACrystal&type=code

There's literally no single case there which would necessitate adding a new method.

I've also some doubts if #names is a good addition. I see some risk that the behaviour cause confusion: The absence of unnamed capture groups might be missed. So I'm leaning towards not having this method.

I don't get the point. If you call names you'd expect to see named captures, what else could you expect?

Is there a good use case for that, besides "Ruby has it"? I don't think it's much relevant and can easily be implemented when needed.

There's no other way to get the list of named captures, so that's enough for the use case for me, and this kind of - already accessible - feature, shouldn't force you to monkey-patch stdlib.

Sija avatar May 09 '21 20:05 Sija

There's literally no single case there which would necessitate adding a new method.

That just means no single use published at GitHub. And it doesn't matter. It's a breaking change.

I don't get the point. If you call names you'd expect to see named captures, what else could you expect?

You could expect the named captures to be indexed by their capture group index.

There's no other way to get the list of named captures, so that's enough for the use case for me,

That's not a use case at all. There's no point in adding a method unless there is a real use case. And sure, you can get an array of names from name_table.values (unsorted) or name_table.to_a.sort!.map(&.[1]) (sorted).

straight-shoota avatar May 10 '21 09:05 straight-shoota

Hi @Sija , thanks for taking the time to write the PR. But we need to ask you a little bit extra:

Can you provide a use case?

@asterite These new methods are documented and I consider that enough for explaining the use cases.

The documentation explains the use of the method, but does not provide enough evidence that the change is necessary. What we're asking is what makes you want to do this change. What couldn't be done before, or how inefficient or ugly was to do it, and what is better with your proposal? Ideally, you should first open an issue stating the problem, and perhaps drafting a solution, and then move to coding when we all agree on your proposal. Since we're discussing it here now, can you provide an answer to this question in this PR? Thanks!

beta-ziliani avatar May 10 '21 12:05 beta-ziliani

@beta-ziliani My use case was the replacement/escaping named group values without knowing the group names in advance. Without #names I had to resort to #name_table.to_a.sort!.map(&.[1]) call. And sorry for the delay 🙏

Sija avatar Jul 02 '21 20:07 Sija

#each_name sounds reasonable, but I agree the method name should be changed. We now have Regex::PCRE2#each_named_capture_group which is pretty much the same idea. Note that you could only skip the Hash allocation, not the Strings themselves; the standard library avoids exposing strings as Bytes publicly in this way, even if it means fewer allocations.

#names is entirely redundant. With #each_name you are free to implement your own #name or #name_array (https://github.com/crystal-lang/crystal/pull/9760#discussion_r628544426) based on it.

HertzDevil avatar Jan 04 '24 12:01 HertzDevil

#each_name sounds reasonable, but I agree the method name should be changed. We now have Regex::PCRE2#each_named_capture_group which is pretty much the same idea. Note that you could only skip the Hash allocation, not the Strings themselves; the standard library avoids exposing strings as Bytes publicly in this way, even if it means fewer allocations.

Yeah, name is up for the debate.

#names is entirely redundant. With #each_name you are free to implement your own #name or #name_array (#9760 (comment)) based on it.

This I'm not sure I follow. Why is it redundant? You could pose the same argument for String#chars vs String#each_char. Also, Ruby has it, so they didn't see it as redundant either.

Sija avatar Apr 05 '24 13:04 Sija