oshun icon indicating copy to clipboard operation
oshun copied to clipboard

BOOL should accept undef

Open happy-barney opened this issue 1 year ago • 35 comments

Unlike other primitive BOOL should accept undef as well.

There are tons of existing code returning undef as false.

IMHO it's not good idea forcing users to change their existing codebase in order to use newer syntax to add some features.

happy-barney avatar Jun 02 '23 09:06 happy-barney

I think this sounds like a very reasonable thing. (Edit: Obviously, I needed to think about this more)

Ovid avatar Jun 02 '23 09:06 Ovid

I believe in the principle that booleans should be treated the same as numbers/text/the-rest as far as undef goes, either they all include undef or none of them do, and for that matter there should be a version of every one of them that does NOT include undef, whether or not there is a version that does include it.

duncand avatar Jul 18 '23 00:07 duncand

@duncand, there's a good reason to consider booleans as separate from numbers and strings with regards to accepting undef.

use warnings;

my $thing = undef;

my $text  = "Hello $thing";  # warns
my $sum   = 42 + $thing;     # warns
do_something() if $thing;    # no warning

Perl itself is happy to treat undef as false in boolean contexts with no warning. So from that perspective, undef is "valid" as a boolean, while it's more questionable as a string or number.

tobyink avatar Jul 21 '23 14:07 tobyink

The problem with booleans as numbers/text/undef, is that there's no semantic information with them. Internally, that kinda works: was that undef supposed to be false or was it an uninitialized variable? However, externally it's a mess. I tire of all the hacks I see to get proper true and false values pass in serialized data such as JSON.

Ovid avatar Jul 21 '23 15:07 Ovid

On 2023-07-21 16:34, Toby Inkster wrote:

@duncand https://github.com/duncand, there's a good reason to consider booleans as separate from numbers and strings with regards to accepting undef.

use warnings;

my $thing =undef;

my $text ="Hello $thing";# warns my $sum = 42 +$thing;# warns do_something()if $thing;# no warning

Perl itself is happy to treat undef as false in boolean contexts with no warning. So from that perspective, undef is "valid" as a boolean, while it's more questionable as a string or number.

There is value, and there is state. Those are IMO best seen as different dimensions.

could be both a state (represented by an undef-flag) and a value (similar to the float value NaN).

Perl has many falsy values: undef, 0, "", "0". A boolean data type adds another falsy value, that is distinguishable from an integer zero.

tests the state (or boolean casting) of a value.

Any non-zero integer will test as true, so would "from that perspective" be "valid" as a boolean? Does that mean that "BOOL should accept" 42?

IMO the BOOL data check should not accept undef.

-- Ruud

Just to be funny:   perl -E'local $| = 42; say $|'   1

druud avatar Jul 21 '23 20:07 druud

@tobyink Just because Perl doesn't warn when given an undef to a boolean context doesn't mean that undef should be considered a valid boolean when it isn't a valid number or string. An undef logically means unknown or unspecified information in the general sense, so treating it the same as known-to-be-false can produce wrong answers, just as wrong as treating undef as zero or the empty string. While it is reasonable for some programs to treat undef as being equivalent to those default/empty/etc values for a type, users should have the choice to have a bool that explicitly rejects undef, same as they have numbers or strings that reject undef.

duncand avatar Jul 21 '23 21:07 duncand

Perhaps something like a LaxBool or Maybe[BOOL] could be done so that the default is stricter? I see there is a parameterised Maybe[ ] in the spec, but only as an example.

zmughal avatar Jul 21 '23 22:07 zmughal

it's common practice to do empty return and use that function's value in boolean context. I think that excluding undef from bool will force unnecessary refactoring on potential users, and that we should avoid.

happy-barney avatar Jul 22 '23 05:07 happy-barney

it's common practice to do empty return and use that function's value in boolean context. I think that excluding undef from bool will force unnecessary refactoring on potential users, and that we should avoid.

This is a non-issue if the course I recommend is followed that 2 versions of the BOOL are provided, one including undef and one without. Given that Oshun has never been in production, there is no existing code using any existing definition of BOOL, so no existing Perl code would have to be refactored one way or the other.

Also when it comes to multiple variants of anything including or not including undef, I believe the version that excludes undef should always have the shorter plainer name, eg plain BOOL/INT/etc should always be the version that excludes undef, as the name fits, what you see is what you get, and longer things like Maybe[BOOL] or BOOL? etc would be the versions that include undef.

duncand avatar Jul 22 '23 05:07 duncand

Currently I'm more concerned about the fact that Perl now has native true and false (thank goodness)!. This is currently not handled by BOOL, but I see that Data::Checks::Parser already uses $] to check the Perl version, so this shouldn't be too hard to add.

it's common practice to do empty return and use that function's value in boolean context.

Either you're producing or consuming that value. If you're consuming:

my BOOL | UNDEF  $foo = get_value();

If you're producing:

sub get_value () returns BOOL | UNDEF {...}

You could also argue that the code should explicitly return undef instead of using a bare return, but if it's existing code,that's a behavioral change.

I think that excluding undef from bool will force unnecessary refactoring on potential users, and that we should avoid.

I am very sympathetic towards this idea, but since Perl cannot distinguish between undefined and uninitialized, we're kind of stuck because we don't know which is which for Perl. So let the consumer specify BOOL | UNDEF if they want to take that (admittedly small) risk.

So the concerns seem trivial to resolve with the existing spec. Have I missed something?

Principle of Parsimony: from there, if we decide to allow undef in BOOL, the above still works. If we realize undef in BOOL was a mistake, we can't walk that back without breaking code.

Ovid avatar Jul 22 '23 07:07 Ovid

FWIW, I am with @happy-barney on the question of whether undef is a valid bool. We only introduce "true" booleans to perl recently, and for most of the existence of Perl Larry defined "false" to be "undef", "0", 0 and the empty string, and everthing else to be true (modulo overload). So I think it would be quite counter productive to not have a way to respect that design intent.

It seems to me that the proposal we should have two keywords for booleans, perhaps distinguished by spelling. For instance I could imagine "Bool" and "BOOL". Or I could imagine "Bool" and "Truthy" or whatever, or maybe "PerlBool" and "JsonBool". (I cant remember the spelling convention we chose) I don't think it is helpful to say that only one model of boolean is supported. I could pass in an obect with boolean overloads, and it should pass a bool test.

I understand the invocation of the principle of parsimony here, but i think it is misguided. Perl supports multiple notions of bool, and our checks should as well. I would be very surprised and consider it counter productive to not be able to use "undef" as a "false" value in a place expecting a bool.

I also note that people seem to be saying that bools should be a number or a string, but i consider every unoverloaded ref to be a valid true value, and any overloaded reference could act as a boolean as well, so I think there is a conceptual clash there as well. Related I think the arguments that undef has a specific meaning closer to an SQL "NULL" tend to fall down under the semantics that Larry chose to provide. undef is == to undef and 0, and undef is eq to undef and "" (albeit with warnings), the mutator operators consider "undef" to be a valid initial value silently mapping to 0 or the empty string, and it is considered a valid boolean value in all internal contexts. So the weight of history does not support that "undef" means "no value" in the same sense that SQL NULL means that. We need to accomodate that history, especially as it is particularly useful in many contexts.

A great deal of code depends on

if ($hash{bool_predicate}) { ... }

working the same regardless if $hash{bool_predicate} is 0 or the empty space, /or missing/ and thus undef. Making BOOL not respect this would be problematic IMO and would inevitably need to be changed in the future.

demerphq avatar Jul 22 '23 08:07 demerphq

@demerphq oh, thanks mentioning overload, raising questions:

  • is in fact valid bool value value with overloaded bool conversion ? (in fact, any value with provided overload)
  • as far as this are not types (though word check is close to it), maybe implicit coercion can be applied on assignment as well ?
    • if it is blessed and overloaded, accept it
    • if it is not, coerce it

happy-barney avatar Jul 22 '23 08:07 happy-barney

I agree that undef cannot behave like NULL. I wish it did, but that ship has sailed.

Otherwise, to answer @happy-barney's question, in Data::Checks::Parser, we have this:

    BOOL   => q(( «NONREF» || overload::Method(§,'bool') )),

So a BOOL is anything that is not a reference, or has boolean overloading. NONREF is defined as this:

    NONREF => q(( «DEF» && !Data::Checks::Parser::reftype(§) )),

So yeah, currently it requires it being defined.

You can read the tests here.

(I'm not saying this is how things have to be, I'm saying "that's what the code currently does")

Also, we don't allow implicit coercion anywhere (that I recall) because of this:

my INT $max_size = \@foo;

Currently, it's trivial to do things like this: say $this + $that and if either $this or $that is a reference, it will be quietly (no warnings!) coerced to an integer. Oshun doesn't stop that directly because it applies to assignment, not operators, but we can at least prevent some simple coercion errors.

Any thoughts about applying checks outside of assignment should be post-MVP.

Ovid avatar Jul 22 '23 09:07 Ovid

On 2023-07-22 10:23, Yves Orton wrote:

FWIW, I am with @happy-barney https://github.com/happy-barney on the question of whether undef is a valid bool. We only introduce "true" booleans to perl recently, and for most of the existence of Perl Larry defined "false" to be "undef", "0", 0 and the empty string, and everthing else to be true (modulo overload). So I think it would be quite counter productive to not have a way to respect that design intent.

For me this (again) conflates states ("test results") and BOOL values. I see no need for a BOOL data element to support undef.

A laxer BOOL-variant can auto-cast undef to false, either always or only at initialization. But I prefer BOOL itself to be strict.

What tests, is truthiness, which for a number means "not zero" etc. so that does not depend on BOOL. Similar for ternary, etc.

P.S. I also like to see (optional) default values, like in C++, where "int x;" sets x to 0.

druud avatar Jul 22 '23 10:07 druud

@druud We vetoed default values early on because they tend to be arbitrary. Just because something defaults to false doesn't mean it's false, for example. If we default my UINT $count; to zero, what if there's actually a count, but due to poor code, it's not assigned? What would be defaults on GLOB or HANDLE?

Ovid avatar Jul 22 '23 10:07 Ovid

better example is with where

my  UINT where { $_ > 1 && $_ % 2 && $_ % 3 && $_ % 4 } $x ...

what should be default ?

happy-barney avatar Jul 22 '23 10:07 happy-barney

oh, test case for parser implementation (should where syntax by like this):

declare check sub;
my sub where { ... } $x;
my sub where { ... };

happy-barney avatar Jul 22 '23 10:07 happy-barney

On 2023-07-22 12:18, Ovid wrote:

@druud https://github.com/druud We vetoed default values early on because they tend to be arbitrary. Just because something defaults to false doesn't mean it's false, for example. If we default |my UINT $count;| to zero, what if there's actually a count, but due to poor code, it's not assigned? What would be defaults on |GLOB| or |HANDLE|?

I think it is fine to not have a way to define a default value right now, but that at some point it will come up again as really useful.

Whether it will then be opt-in or opt-out, is harder to predict. For example: Java has no native 64-bit unsigned integer. (in 64-bit context)

For defaults, I stick to the 'empty' approach: zero for numeric values, empty string for textual values, false for boolean values, empty for arrays and for hashes, etc. Not all data types need to support a default. And maybe not call it :default, but call it :empty.

-- Ruud

druud avatar Jul 22 '23 14:07 druud

On 2023-07-22 12:42, Branislav Zahradník wrote:

better example is with |where|

|my UINT where { $_ > 1 && $_ % 2 && $_ % 3 && $_ % 4 } $x ... |

what should be default ?

If the default of UINT (or rather its "empty value") is defined as 0, and there is no initialization value, then this declaration would error.

If UINT supports undef, then its "empty value" is (always) undef, and it can't have a (different) "empty value" configured.

If UINT doesn't support undef, then it can have an "empty value". Without a configured "empty value", it then must be initialized at declaration.

So it depends on what is in the '...'. And on how UINT is configured.

-- Ruud

druud avatar Jul 22 '23 14:07 druud

@druud I'd say this discussion is irrelevant for MVP. In later stages (that's why I created Roadmap discussion ...) there should be warning / error when reading uninitialized variable which had specified contract.

happy-barney avatar Jul 22 '23 14:07 happy-barney

On 2023-07-22 16:56, Branislav Zahradník wrote:

@druud https://github.com/druud I'd say this discussion is irrelevant for MVP. In later stages (that's why I created Roadmap discussion ...) there should be warning / error when reading uninitialized variable which had specified contract.

Yes, all good.

-- Ruud

druud avatar Jul 22 '23 17:07 druud

Not all data types need to support a default. And maybe not call it :default, but call it :empty.

Maybe call it :unassigned or some synonym instead which is more accurate on what it actually is talking about.

Some types don't have a concept of the "empty" value either. What would be the "empty" value for a day-of-week type, or an odd-integer type?

No matter what we call it, given that in Perl like in many languages an object isn't always just representing a plain old value that can be cleanly serialized and deserialized, like a tree whose endpoints are all regular scalars, instead sometimes objects represent other resources or things, such as an open file handle or something, which are practically impossible to serialize/deserialize and it only makes sense to create them with explicit constructor arguments, and so you can't have a default/empty/whatever value.

I feel the simplest solution that takes all of the options into account is, if the type includes undef then that is its default value when not explicitly initialized, and otherwise an explicit initializing assignment is required for any variable having a CHECK. I seem to recall something like this, or explicit-assignment-always, was proposed and favoured before.

duncand avatar Jul 22 '23 20:07 duncand

Another odd consideration: If we allow undef for BOOL, we introduce the potential for a slight inconsistency.

my BOOL $success; # succeeds because `undef` satisfies BOOL
my INT  $count;   # fails because `undef` doesn't satisfy INT

Not saying it's a blocker, but it's a tiny thing we might want to consider.

Ovid avatar Jul 23 '23 13:07 Ovid

Another odd consideration: If we allow undef for BOOL, we introduce the potential for a slight inconsistency.

my BOOL $success; # succeeds because `undef` satisfies BOOL
my INT  $count;   # fails because `undef` doesn't satisfy INT

Not saying it's a blocker, but it's a tiny thing we might want to consider.

I wouldn't call that minor, I would call that a MAJOR inconsistency, that shouldn't happen. All of the plain-named types/checks like BOOL/INT/etc should be mutually consistent, either they all include undef, or they all exclude it, not some of one and some of the other. Having them different in this respect violates the principle of least surprise, that things which look similar should behave similar etc. So I strongly disagree with the original proposal that BOOL is treated differently than the others like INT/etc.

duncand avatar Jul 23 '23 23:07 duncand

That is my position too. And don't conflate a BOOL data value with 'boolean context'.

druud avatar Jul 24 '23 08:07 druud

Coming back to 'empty': your examples of weekday and odd-int, just show why I propose to call it 'empty' and not 'undefined'.

Some types are good to get set when "left empty", and others aren't. I don't think we need this 'empty' feature from the start, though I think it will help making things easier to accept, as it would lead to less boilerplate: no need to initialise (almost) every INT to 0, TEXT to "", etc.

Such an 'empty' mechanism is best strictly limited to only use values like 0, "", \0, false and such. Such an 'empty' mechanism is much smaller than an 'undefined' mechanism would be, and leads to much less 'action at a distance'. So IMO it shouldn't support a <BOOL $active> with a default of true, but a <BOOL $obsoleted> with a default of false would be OK.

For example have an UINT32Z, that is mostly like UINT32, but also auto-initialises to 0.

druud avatar Jul 24 '23 08:07 druud

@druud FYI: https://www.nntp.perl.org/group/perl.perl5.porters/2022/09/msg264841.html

happy-barney avatar Jul 24 '23 08:07 happy-barney

Important is that such a mechanism can not be used to auto-initialise values to any truthy, non-empty value. So the \0 that I mentioned earlier, must be blessed and have proper overload logic.

druud avatar Jul 24 '23 10:07 druud

@druud milestone XYZ:

declare check FOO as UINT where { ... }  default 1_204;

happy-barney avatar Jul 24 '23 10:07 happy-barney

So what I'm seeing here:

  1. Current behavior allows us to do BOOL | UNDEF to trivially satisfy this constraint.
  2. Defaults are post-MVP
  3. builtin::true and builtin::false complicate things a bit
  4. Implicit coercions must not be allowed because we can't disable checks safely if they rely on coercive behavior
  5. We're not sure what the rest of the world wants.

Given the first point, I think trying to figure out what people are going to do with this is maybe a bad thing. Let's look at his from a "lean project management" standpoint: we build what we need and nothing more. When we get feedback on real-world use, we can incorporate that.

Does that sound good, or have I missed something?

Ovid avatar Jul 24 '23 10:07 Ovid