regexp2 icon indicating copy to clipboard operation
regexp2 copied to clipboard

Is there any workaround for `split`?

Open i-am-the-slime opened this issue 1 year ago • 2 comments

Thanks for this nice library!

I'm using this library from another language that can compile to Golang. I've now finally hit the case where I use a library that needs split on regex. You mention in the README that this you're still working on this. Do you happen to have a draft or other unfinished code that can do some splitting (maybe slow, maybe wrong in edge cases)?

i-am-the-slime avatar Aug 12 '24 17:08 i-am-the-slime

I had written a split function (based on C#) for the code-gen version of the library. I suspect it'll work with the main version as well, but there are probably edge cases:

// Split splits the given input string using the pattern and returns
// a slice of the parts. Count limits the number of matches to process.
// If Count is -1, then it will process the input fully.
// If Count is 0, returns nil. If Count is 1, returns the original input.
// The only expected error is a Timeout, if it's set.
//
// If capturing parentheses are used in the Regex expression, any captured
// text is included in the resulting string array
// For example, a pattern of "-" Split("a-b") will return ["a", "b"]
// but a pattern with "(-)" Split ("a-b") will return ["a", "-", "b"]
func (re *Regexp) Split(input string, count int) ([]string, error) {
	if count < -1 {
		return nil, errors.New("count too small")
	}
	if count == 0 {
		return nil, nil
	}
	if count == 1 {
		return []string{input}, nil
	}
	if count == -1 {
		// no limit
		count = math.MaxInt64
	}

	// iterate through the matches
	priorIndex := 0
	var retVal []string
	var txt []rune

	m, err := re.FindStringMatch(input)

	for ; m != nil && count > 0; m, err = re.FindNextMatch(m) {
		txt = m.text
		// if we have an m, we don't have an err
		// append our match
		retVal = append(retVal, string(txt[priorIndex:m.Index]))
		// append any capture groups, skipping group 0
		gs := m.Groups()
		for i := 1; i < len(gs); i++ {
			retVal = append(retVal, gs[i].String())
		}
		priorIndex = m.Index + m.Length
		count--
	}

	if err != nil {
		return nil, err
	}

	if txt == nil {
		// we never matched, return the original string
		return []string{input}, nil
	}

	// append our remainder
	retVal = append(retVal, string(txt[priorIndex:]))

	return retVal, nil
}

It uses the m.txt private field, but I'm sure it could be written without it for your purposes. Let me know if you run into any issues. I could look at adding this to the main library version.

dlclark avatar Aug 12 '24 22:08 dlclark

@i-am-the-slime did this help?

dlclark avatar Aug 15 '24 15:08 dlclark

@dlclark Yes, very much so, thanks!

i-am-the-slime avatar Nov 02 '24 16:11 i-am-the-slime