crystal icon indicating copy to clipboard operation
crystal copied to clipboard

`String#byte_index(Regex)`

Open HertzDevil opened this issue 3 months ago • 0 comments

On Discord it was asked how to obtain the starting byte index of a Regex match of a String:

str = "😂x"
pattern = /x/
str.char_index_to_byte_index(str.index!(pattern)) # => 4
str.match!(pattern).byte_begin                    # => 4

But there is already a method for this in String:

class String
  def byte_index(byte : Int, offset = 0) : Int32?
  end

  def byte_index(char : Char, offset = 0) : Int32?
  end

  def byte_index(search : String, offset = 0) : Int32?
  end
end

So I'd suggest adding a Regex overload to String#byte_index as well, to make the API more ergonomic:

class String
  def byte_index(pattern : Regex, offset = 0, *, options : Regex::MatchOptions = Regex::MatchOptions::None) : Int32?
    offset += bytesize if offset < 0
    return if offset < 0

    if match = pattern.match_at_byte_index(self, offset, options: options)
      match.byte_begin
    end
  end
end

str.byte_index(pattern)             # => 4
str.byte_index(pattern, offset: 4)  # => 4
str.byte_index(pattern, offset: 5)  # => nil
str.byte_index(pattern, offset: -1) # => 4
str.byte_index(/y/)                 # => nil

Note that offset is a byte index and that the return type is nilable.

HertzDevil avatar Apr 04 '24 11:04 HertzDevil