datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

Implement tests for casting complex types

Open andygrove opened this issue 1 month ago • 1 comments

What is the problem the feature request solves?

There are currently gaps in testing for casts involving complex types. This issue is for adding tests to cover all casts thet Comet claims to support.

Describe the potential solution

[This section is AI generated and not verified to be accurate]

Summary of Current Cast Test Coverage

ARRAY Type Casts

Currently Tested (in CometCastSuite.scala:1050-1073):

  • ✅ Array → String (tested with various element types: Boolean, String, Byte, Integer, Long, Short, Decimal(10,2), Decimal(38,18))

Implemented but NOT tested:

  • ❌ Array → Array (with element type casting) - Implemented in cast.rs:1034-1036
  • ❌ Null → Array - Marked as compatible in cast.rs:1141

STRUCT Type Casts

Currently Tested (in CometCastSuite.scala:959-1017):

  • ✅ Struct → String (with primitive types, decimals, dates, timestamps, nested structs)
  • ✅ Struct → Struct (with field type changes)
  • ✅ Struct → Struct (with different field names)

Well covered - The struct casting tests are comprehensive.

MAP Type Casts

Currently Tested:

  • ❌ NO TESTS AT ALL - MapType is completely missing from cast tests

Implementation Status:

  • Maps are NOT implemented in cast.rs (no Map-related cast logic found)
  • CometCast.scala does NOT support map casting (not in the isSupported method)

Identified Testing Gaps

Critical Gaps (High Priority)

  1. Map Type Casting - Completely Missing - No tests for any map casting operations - Map → String (if Spark supports it) - Map → Map (with key/value type changes) - This appears to be completely unsupported, which should be documented
  2. Array → Array Casting - Gap: Element type transformations are implemented but not tested - Examples needed:
    • CAST(array(1, 2, 3) AS array) - Int array to String array
    • CAST(array(1.5, 2.5) AS array) - Float array to Int array (with rounding)
    • CAST(array('1', '2', '3') AS array) - String array to Int array (parsing)
    • Nested arrays: array<array> to array<array>
  3. Array of Structs → String - Not currently tested - Example: array(struct(1, 'a'), struct(2, 'b')) cast to string
  4. Struct with Array Fields → Struct - Not explicitly tested with array field type changes - Example: struct<a: array> → struct<a: array>
  5. Null Handling in Complex Types - Arrays with null elements being cast - Structs with null fields being cast - Null arrays/structs themselves

Medium Priority Gaps

  1. Array with Binary Elements → String - Explicitly marked as Incompatible in CometCast.scala:119-120 - Should have a test verifying this incompatibility
  2. Arrays of Timestamps/Dates → Arrays of Other Types - Test temporal type array transformations - Example: array → array
  3. Edge Cases for Array Casts - Empty arrays: CAST(array() AS array) - Single element arrays - Large arrays (performance/correctness) - Arrays with all nulls
  4. Struct with Map Fields - If/when map casting is supported - struct<a: int, b: map<string, int>> casting
  5. ANSI Mode Testing for Complex Types - Array/Struct casts with invalid conversions in ANSI mode - Error propagation from element casts

Low Priority Gaps

  1. Dictionary-encoded Complex Types - Dictionary-encoded arrays/structs - The test suite has some dictionary support for primitives
  2. Deep Nesting - array<array<array>> → array<array<array>> - struct<a: struct<b: struct<c: int>>> with type changes

Recommendations

Immediate Actions

  1. Add Array → Array tests covering: test("cast ArrayType to ArrayType with different element types") { // Int → String // Float → Int
    // String → Int // Decimal → Int // Boolean → String }
  2. Document Map casting limitations: - Add ignored/commented tests showing map casting is unsupported - Or implement map casting if Spark supports it
  3. Add null handling tests for complex types: test("cast ArrayType with nulls") { // [null, 1, 2] as array // null array value }

Implementation Gaps

From the code analysis, these casts are implemented in Rust but lack Scala tests:

  • Array[T1] → Array[T2] (recursive element casting)
  • Null → Array[T]

These casts are completely missing implementation:

  • Anything involving MapType

Test Coverage Metrics

Current complex type cast coverage:

  • Struct casts: ~80% (good coverage)
  • Array casts: ~20% (only Array→String tested)
  • Map casts: 0% (no implementation or tests)

Suggested Test File Structure

Consider organizing tests as: // In CometCastSuite.scala

// ARRAY CASTS test("cast ArrayType to StringType") { ... } // EXISTS test("cast ArrayType to ArrayType - element type changes") { ... } // MISSING test("cast ArrayType to ArrayType - nested arrays") { ... } // MISSING
test("cast ArrayType with null elements") { ... } // MISSING

// STRUCT CASTS
test("cast StructType to StringType") { ... } // EXISTS test("cast StructType to StructType") { ... } // EXISTS test("cast StructType with ArrayType fields") { ... } // MISSING test("cast StructType with null fields") { ... } // MISSING

// MAP CASTS ignore("cast MapType to StringType") { ... } // MISSING - not implemented ignore("cast MapType to MapType") { ... } // MISSING - not implemented

This analysis should help you prioritize which tests to add to improve coverage of complex type casting operations.

Additional context

No response

andygrove avatar Nov 12 '25 16:11 andygrove

I'd like to contribute to this enhancement, starting with Array to Array casting.

manuzhang avatar Nov 17 '25 13:11 manuzhang