Auto-magically infer types?
A few prior steps;
- summary reporting on type conversions
- then code generation
- generate case class code
- finally a macro
- Summary reporting on type conversions First: add functionality to analyze CSV data and report what types each column could be converted to This would help users understand their data
- Code generation Generate code that would handle the appropriate type conversions
- Generate case class code Automatically create Scala case classes that match the inferred structure This would allow users to work with properly typed data objects instead of tuples
- Finally a macro Implement this functionality as a Scala macro for compile-time processing
Current approach all string val row = csvData.next() // (col1: String, col2: String, col3: String) val sum = row.col1.toInt + row.col2.toInt
Actual provide like pandas experience val row = csvData.next() // (col1: Int, col2: Int, col3: String) val sum = row.col1 + row.col2 // no conversion needed
I would like to work on this project can you assign me ?
Feel free to submit a PR - there's no-one really looking at it. I should warn you that I don't think this is particularly easy. I would doubt it's vibecode-able.
You can ping me on discord if you have questions, I'm usually found hanging around the scala forums, same handle as GitHub.
Some inspiration in this post on discord I believe on a potential first cut of how this could be implemented.
https://discord.com/channels/632150470000902164/875868146949554207/1379794983795818518
More ideas; https://discord.com/channels/632150470000902164/875868146949554207/1385296719255830760
import scala.quoted.*
case class FieldDescription(name: String, typeName: String)
case class NamedTupleElement(labelName: String, valueType: Type[?])
object NamedTupleElement:
def fromFieldDescription(fieldDescription: Expr[FieldDescription])(using Quotes) =
import quotes.reflect.*
fieldDescription match
case '{ FieldDescription(${ Expr(name) }, ${ Expr(typeName) }) } =>
NamedTupleElement(name, mapFieldType(typeName))
case _ => report.errorAndAbort("Fields must be known at compile time.")
private def mapFieldType(typeName: String)(using Quotes) =
import quotes.reflect.*
typeName match
case "String" => Type.of[String]
case "Int" => Type.of[Int]
case _ => report.errorAndAbort(s"Unsupported type for field type: $typeName")
private def typeReprFromType(tpe: Type[?])(using Quotes) =
import quotes.reflect.*
tpe match
case '[t] => TypeRepr.of[t]
transparent inline def makeNamedTupleBuilderVarArg(inline fields: FieldDescription*) =
${ makeNamedTupleBuilderVarArgImpl('fields) }
private def makeNamedTupleBuilderVarArgImpl(fieldsExpr: Expr[Seq[FieldDescription]])(using Quotes) =
import quotes.reflect.*
// Extract compile-time known values from each FieldDescription
// to get a sequence of NamedTupleElements each holding a label name (String) and value type (Type[?]).
val namedTupleElements = fieldsExpr match
case Varargs(fieldDescriptionExprSeq) =>
fieldDescriptionExprSeq.map:
NamedTupleElement.fromFieldDescription
makeNamedTupleBuilderImpl(namedTupleElements)
private def makeNamedTupleBuilderImpl(namedTupleElements: Seq[NamedTupleElement])(using Quotes) =
import quotes.reflect.*
// Get the TypeRepr of `scala.*:` which is the cons type for Tuples / HList in Scala3
val tupleConsTypeRepr = TypeRepr.of[scala.*:]
// Construct a TypeRepr representing the named-tuple labels.
// Fold list of label names of type `String`, `List(l0, l1, ...)`
// to a tuple of the corresponding `ConstantType` elements `Tuple[lt0, lt1, ...]`
// using the `*:` type class:
// lt0 *: lt1 *: ... *: EmptyTuple
val labelsTypeRepr = namedTupleElements.view
.map(_.labelName)
.foldRight(TypeRepr.of[EmptyTuple]): (labelName, acc) =>
val labelType = ConstantType(StringConstant(labelName))
tupleConsTypeRepr.appliedTo(List(labelType, acc))
// Construct a List[TypeRepr[?]] representing the types of the values in the named tuple.
val fieldTypeReprs = namedTupleElements.view
.map(namedTupleElement => typeReprFromType(namedTupleElement.valueType))
.toList
// Fold list of TypeRepr List(t0, t1, ..., tn) to Tuple[t0, t1, ..., tn] type using the `*:` type class:
// t0 *: t1 *: ... *: EmptyTuple
val valuesTupleTypeRepr = fieldTypeReprs.foldRight(TypeRepr.of[EmptyTuple]): (tpe, acc) =>
tupleConsTypeRepr.appliedTo(List(tpe, acc))
// Create parameter names v0, v1, ... for the lambda.
val paramNames = (0 to fieldTypeReprs.length).map(i => s"v$i").toList
// Construct type of named tuple we want to return including the types of the values.
// (label0: t0, label1: t1, ..., labeln: tn)
val namedTupleTypeRepr = TypeRepr.of[NamedTuple.NamedTuple].appliedTo(List(labelsTypeRepr, valuesTupleTypeRepr))
// Construct the type of the lambda:
// (v0: t0, v1: t1, ..., vn: tn) => (label0: t0, label1: t1, ..., labeln: tn)
val funcType = MethodType(paramNames)(
_ => fieldTypeReprs,
_ => namedTupleTypeRepr
)
(labelsTypeRepr.asType, namedTupleTypeRepr.asType) match
case ('[labelsType], '[namedTupleType]) =>
val lambda = Lambda(
Symbol.spliceOwner,
funcType,
(owner, params) => {
// Convert argument list into a tuple.
val valuesAsTuple = Expr.ofTupleFromSeq(params.map(_.asExpr))
// Build the named tuple.
// Cast the result to the correct type or we get (label0: Any, ...).
val namedTupleExpr = '{
NamedTuple.build[labelsType & Tuple]()(${ valuesAsTuple })
.asInstanceOf[namedTupleType]
}
// Return the body of the lambda
namedTupleExpr.asTerm
}
)
// Emit the lamda as the result of our macro.
lambda.asExpr
case _ =>
report.errorAndAbort("Unexpected error matching on types.")
#69