starlark icon indicating copy to clipboard operation
starlark copied to clipboard

Equality and truthiness of `string#elems`

Open fmeum opened this issue 1 year ago • 2 comments

The spec isn't clear on how equality and truthiness should behave for the value returned by string#elems (and similar Sequences).

The available implementations differ:

  • Java
>> bool("".elems())
False
>> "foo".elems() == "foo".elems()
True
  • Go
>>> bool("".elems())
True
>>> "foo".elems() == "foo".elems()
True
  • Rust
$> bool("".elems())
True
$> "foo".elems() == "foo".elems()
False

In Python, custom iterables (such as generator expressions) are often truthy regardless of whether they are non-empty and non-identical generator objects are distinct even if they yield the same elements (ignoring e.g. range, which is specced to determine equality as sequences), which matches the behavior of the Rust implementation.

@adonovan @stepancheg @brandjon What are your thoughts on this?

Edit: Noticed that Python generator expressions are always truthy, not just when non-empty.

fmeum avatar Sep 25 '24 17:09 fmeum

The spec should probably take a position one way or another. I think making iterators non-equal even if their sequences are equal is probably the prudent course; even if in the string case it's easy either way, for other iterators, determining that they generate equal elements may not be a cheap computation.

adonovan avatar Sep 25 '24 17:09 adonovan

See also #292. We could define equality/truthiness of sequences to always be based on their content, and say for non-sequence iterables this may or may not hold.

You'd think an iterable should be reflexively equal to itself, but that doesn't mean that calling .elems() always gives you the same iterable.

brandjon avatar Dec 18 '24 22:12 brandjon

See discussion in #318. I think elems(), as well as the return values of enumerate(), range(), and zip(), should be Sequences. But that doesn't necessarily mean sequences need to support value equality.

brandjon avatar Aug 13 '25 18:08 brandjon

My earlier comment confused iterables (sequences) and iterators, which are first class stateful values in Python (e.g. [1, 2, 3].__iter__()) but are hidden in Starlark.

Python3's range(n) data type is a sequence of length n that compares equal to other values that denote the same sequence, and is truthy iff non-empty. This seems like a reasonable path to follow.

(By contrast, Python's iterators, e.g. "abc".__iter__(), are always true and unequal to all other values, which makes sense since they are stateful.)

adonovan avatar Aug 18 '25 11:08 adonovan

I think elems(), as well as the return values of enumerate(), range(), and zip(), should be Sequences.

My thinking evolved a bit since I wrote that. See #29 for a more up-to-date discussion. range() is definitely a sequence regardless. I don't have much intuition either way on str.elems().

I like the idea of not doing a deep equality check on views or similar values. But again, see #29 for whether things like reversed() return views at all.

The only thing is that if we do identity equality, it suddenly becomes important to specify when such an object is returned vs reused.

brandjon avatar Aug 18 '25 18:08 brandjon

Given the decision in #29 to not use views for most builtin methods, the impact of this issue is not as big.

brandjon avatar Aug 28 '25 15:08 brandjon