qcert icon indicating copy to clipboard operation
qcert copied to clipboard

Spark 2 backend and codegen issues

Open fehrenbach opened this issue 8 years ago • 0 comments

  • [x] The main issue is that the BLOB format is not sufficiently self-describing and cannot be deserialized without knowing the output type. Some of the things below directly or indirectly depend on this.
  • [ ] Equality is not correct for records and in general relies on sorting which is not thoroughly tested
  • [ ] Some operators use Scala equality, at least: bag min, max, minus
  • [ ] AToString is wrong, especially for open records
  • There is no codegen for
    • [ ] foreign types and operators
    • [ ] ASingleton
    • [ ] ARecRemove
    • [ ] AUArith ArithLog2
    • [ ] AUArith ArithSqrt
  • [ ] Codegen does not deal with general casts. We emit literal data at the correct type, but this relies on only literal data needing type conversion. We did not prove that we can always push down casts to data literal leaves and even if we did, we don't do that. If we could prove that this discrepancy only occurs on data literal leaves, that would be great, we could get rid of the whole required type.
  • [ ] left UNIT is indistinguishable from right x at type Either Unit t. Easy fix: add column $isLeft : Boolean.
  • [ ] Not an issue per se, but we might want to change the record encoding to have a special field $dotdot at the same level as the known fields and remove the $blob/$known nesting. (Pick a prefix character that sorts before all legal field names.)

fehrenbach avatar Aug 02 '16 21:08 fehrenbach