ibis icon indicating copy to clipboard operation
ibis copied to clipboard

feat/bug: make `struct()` respect `type` kwarg when ibis `Value`s are passed

Open NickCrews opened this issue 8 months ago • 0 comments

Is your feature request related to a problem?

import ibis

# literal() respects the type kwarg as I would expect,
# and errors as I expect when a bogus type is passed:
ibis.literal({"x": 1}, type="struct<x: int64>").type()
# 'Struct([('x', Int64(nullable=True))], nullable=True)
ibis.literal({"x": 1}, type="!struct<x: int64>").type()
#'Struct([('x', Int64(nullable=True))], nullable=False)
ibis.literal({"x": 1}, type="!struct<bogus: string>").type()
# TypeError: Unable to normalize {'x': 1} to !struct<bogus: string> because of missing keys {'bogus'}

# struct() respects the type kwarg as long
# as every one of the fields are not ibis.Value's.
# This is because in this code path, the implementation just delegates to
# ibis.literal(), so we get the exact same behavior as above:
ibis.struct({"x": 1}, type="struct<x: int64>").type()
# 'Struct([('x', Int64(nullable=True))], nullable=True)
ibis.struct({"x": 1}, type="!struct<x: int64>").type()
# 'Struct([('x', Int64(nullable=True))], nullable=False)
ibis.struct({"x": 1}, type="!struct<bogus: string>").type()
# TypeError: Unable to normalize OrderedDict([('x', 1)]) to !struct<bogus: string> because of missing keys {'bogus'}

# struct(), if any of the fields are ibis.Value's,
# always makes the value nullable, regardless of the passed type,
# and does no checking for absolutely bogus types:
ibis.struct({"x": ibis.literal(1)}, type="struct<x: int64>").type()
# 'Struct([('x', Int8(nullable=True))], nullable=True)
ibis.struct({"x": ibis.literal(1)}, type="!struct<x: int64>").type()
# 'Struct([('x', Int8(nullable=True))], nullable=True)
# I expect this to be nullable=False
ibis.struct({"x": ibis.literal(1)}, type="!struct<bogus: string>").type()
# 'Struct([('x', Int8(nullable=True))], nullable=True)
# I would expect this to error.

What is the motivation behind your request?

I was getting bugs because I expected the type kwarg to be respected. I spent 20 minutes looking for a bug in my code, before I found this behavior in ibis. The docstring for ibis.struct does admit that the type kwarg "is only used if all of the input values are Python literals.", but I still view this as footgun behavior.

Describe the solution you'd like

The current implementation:

def struct(
    value: Iterable[tuple[str, V]] | Mapping[str, V],
    *,
    type: str | dt.DataType | None = None,
) -> StructValue:
    import ibis.expr.operations as ops

    fields = dict(value)
    if any(isinstance(value, Value) for value in fields.values()):
        names = tuple(fields.keys())
        values = tuple(fields.values())
        return ops.StructColumn(names=names, values=values).to_expr()
    else:
        return literal(collections.OrderedDict(fields), type=type)

two options I see (may be more):

  1. Tack on a .cast(type) onto the ops.StructColumn just before returning. This will take care of both the nullability issue. Easier, and will be a no-op if the naturally infered dtype is the same as the cast one, because .cast() already has this optimization. Will be minimal extra SQL if the cast is indeed necessary. This does NOT take care of the bogus type issue: currently, if you do ibis.struct({"x": ibis.literal(1)}).cast("!struct<bogus: string>").type() you get no error at expression build time, you only will get it at execution time. So in order to get this to error show up as I want, we would also need to add this check-the-cast-is-valid logic to either a) the implementation of .cast() or b) here in struct().

This is currently the solution I do in a wrapper function that I have:

@deferrable
def struct(
    value: Iterable[tuple[str, ir.Value]] | Mapping[str, ir.Value],
    *,
    type: str | dt.DataType | None = None,
) -> ir.StructValue:
    """ibis.struct(), but better handling of the type argument. See https://github.com/ibis-project/ibis/issues/11069"""
    raw = ibis.struct(value, type=type)
    if type is None:
        return raw
    type = ibis.dtype(type)
    if type.is_struct():
        type: dt.Struct
        needed_fields = set(type.names)
        missing_fields = needed_fields - set(raw.names)
        if missing_fields:
            raise TypeError(
                f"Struct is missing fields {missing_fields} that are in the type {type}"
            )
    return raw.cast(type)
  1. Parse the type to a true ibis.DataType. Go through each of the inputs in values, turn them into ibis.Values using ibis.literal() if needed, and cast them to the right type. Then, when we pass these Values to ops.StructColumn, and they are the correct type. This is also where we would check for the bogus issue. Then, we still need to solve the nullability problem. For that, I think we would add a new nullable: bool attribute to ops.StructColumn.### What version of ibis are you running?

main

What backend(s) are you using, if any?

NA

Code of Conduct

  • [x] I agree to follow this project's Code of Conduct

NickCrews avatar Apr 02 '25 19:04 NickCrews