crystal
crystal copied to clipboard
`String#byte_index(Regex)`
On Discord it was asked how to obtain the starting byte index of a Regex
match of a String
:
str = "😂x"
pattern = /x/
str.char_index_to_byte_index(str.index!(pattern)) # => 4
str.match!(pattern).byte_begin # => 4
But there is already a method for this in String
:
class String
def byte_index(byte : Int, offset = 0) : Int32?
end
def byte_index(char : Char, offset = 0) : Int32?
end
def byte_index(search : String, offset = 0) : Int32?
end
end
So I'd suggest adding a Regex
overload to String#byte_index
as well, to make the API more ergonomic:
class String
def byte_index(pattern : Regex, offset = 0, *, options : Regex::MatchOptions = Regex::MatchOptions::None) : Int32?
offset += bytesize if offset < 0
return if offset < 0
if match = pattern.match_at_byte_index(self, offset, options: options)
match.byte_begin
end
end
end
str.byte_index(pattern) # => 4
str.byte_index(pattern, offset: 4) # => 4
str.byte_index(pattern, offset: 5) # => nil
str.byte_index(pattern, offset: -1) # => 4
str.byte_index(/y/) # => nil
Note that offset
is a byte index and that the return type is nilable.