mojo icon indicating copy to clipboard operation
mojo copied to clipboard

[Feature Request] Add larger than and larger than or equal operator support to String

Open VMois opened this issue 10 months ago • 9 comments

Review Mojo's priorities

What is your request?

Add larger than (lt) and larger than or equal (le) operator support to String.

What is your motivation for this change?

Currently, it is not possible to compare Strings easily. It is very useful when sorting Strings.

Any other details?

No response

VMois avatar Apr 20 '24 00:04 VMois

maybe I can work on this one if upstream think this work is worth?

zhoujingya avatar Apr 22 '24 01:04 zhoujingya

@zhoujingya, I have already made a snippet (needed it for a project to compare strings). Do you want me to share it with you, or would you prefer to do it yourself? My snippet is very simple. You might want to see if it is possible to use SIMDs when comparing.

VMois avatar Apr 22 '24 01:04 VMois

maybe I can wok on this one if upstream think this work is worth?

I think @JoeLoser already marked it as a good first issue which means you are free to take it.

VMois avatar Apr 22 '24 01:04 VMois

@VMois wow, very appreciate for your snippets.thanks

zhoujingya avatar Apr 22 '24 01:04 zhoujingya

Here is what I used as a quick fix. I would double-check that :) And definitely look at SIMDs.

@always_inline
fn __lt__(self, other: String) -> Bool:
    var p1 = self._buffer
    var p2 = other._buffer
    var min_len = len(self)
    if len(other) < min_len:
        min_len = len(other)
    for i in range(min_len):
        var ai = p1[i].cast[DType.uint8]()
        var bi = p2[i].cast[DType.uint8]()
        if ai > bi:
            return False
        if ai < bi:
            return True
    return len(self) <= len(other)

@always_inline
fn __le__(self, other: String) -> Bool:
    return self < other or self == other

VMois avatar Apr 22 '24 01:04 VMois

@VMois ,so maybe SIMDs is another issue?

zhoujingya avatar Apr 22 '24 01:04 zhoujingya

@VMois ,so maybe SIMDs is another issue?

Up to you. You can do a simple version or look at SIMD. I am just a random guy :)

VMois avatar Apr 22 '24 03:04 VMois

Why not just use memcmp? String's __eq__ already does and I believe that's what Python does too.

I was thinking something like:

struct String(Sized, Stringable, IntableRaising, KeyElement, Boolable):
    """Represents a mutable string."""

    alias _buffer_type = List[Int8]
    var _buffer: Self._buffer_type
    """The underlying storage for the string."""

    ...

    @always_inline
    fn __lt__(self, rhs: String) -> Bool:
        """Compare this String to the RHS using LT comparison.

        Args:
            rhs: The other String to compare against.

        Returns:
            True if this String is strictly less than the RHS String and False otherwise.
        """
        var len1 = len(self)
        var len2 = len(rhs)
        var cmp = memcmp(self._as_ptr(), rhs._as_ptr(), min(len1, len2))

        return cmp < 0 or cmp == 0 and len1 < len2
    
    @always_inline
    fn __le__(self, rhs: String) -> Bool:
        """Compare this String to the RHS using LE comparison.

        Args:
            rhs: The other String to compare against.

        Returns:
            True if this String is less than or equal to the RHS String and False otherwise.
        """
        return not (rhs < self)
    
    @always_inline
    fn __gt__(self, rhs: String) -> Bool:
        """Compare this String to the RHS using GT comparison.

        Args:
            rhs: The other String to compare against.

        Returns:
            True if this String is strictly greater than the RHS String and False otherwise.
        """
        return rhs < self
    
    @always_inline
    fn __ge__(self, rhs: String) -> Bool:
        """Compare this String to the RHS using GE comparison.

        Args:
            rhs: The other String to compare against.

        Returns:
            True if this String is greater than or equal to the RHS String and False otherwise.
        """
        return not (self < rhs)
    
    ...

I haven't testing this myself, it's just an idea :) Also, this might need to change when Mojo properly handles unicode. If we want to add this to String then something similar can be done with StringLiteral too

siitron avatar Apr 22 '24 14:04 siitron