dataframe icon indicating copy to clipboard operation
dataframe copied to clipboard

Explore the possiblity to define schema and constunct object in Spring (DI framework) style

Open zaleslaw opened this issue 5 months ago • 2 comments

Motivation:

  • Nice to have an ability to hide reading the schema and object construction from the user in DI style where all the data sources and its life-cycle is managed by DI framework
  • Unified approach for Spring developers

We want to have something like

@DataSource(csvFile = "data.csv")
val df: DataFrame<MyRowType>

it should be, for example RUNTIME-annotation

@Target(AnnotationTarget.FIELD, AnnotationTarget.PROPERTY)
@Retention(AnnotationRetention.RUNTIME)
annotation class DataSource(val csvFile: String)

we need a DataFrame postprocessor in the style

@Component
class DataFramePostProcessor : BeanPostProcessor {

    override fun postProcessBeforeInitialization(bean: Any, beanName: String): Any? {
        bean::class.memberProperties.forEach { prop ->
            val annotation = prop.findAnnotation<DataSource>()
            if (annotation != null) {
                val csvPath = annotation.csvFile
                val dataFrame = DataFrame.readCSV(File(csvPath))


                val field = bean.javaClass.getDeclaredField(prop.name)
                field.isAccessible = true
                field.set(bean, dataFrame)
            }
        }
        return bean
    }
}

The usage of the bean could be like below

@Component
class MyDataService {

    @DataSource(csvFile = "data.csv")
    lateinit var df: DataFrame<MyRow>

    fun process() {
        println(df.rowsCount())
    }
}

zaleslaw avatar Jul 11 '25 12:07 zaleslaw

I like this approach too. It's much easier to support from a compiler plugin (compared to (@Import DataFrame.read(""))), since annotations rely solely on constants anyway. We would need to try to avoid some of the pitfalls we encountered in the past when using annotations, though, like with @ImportDataSchema.

@ImportDataSchema tried to support all types of data sources in one annotation, which results in "delimiters" being an available option for reading JSON and whatnot.

We could still have a generic source annotation, but we'd also need specific ones:

  • @DataSource("") -> DataFrame.read("")
  • @CsvDataSource("", delimiter=';') -> DataFrame.readCsv("", delimiter=';')
  • ...
  • @JsonDataSource("", unifyNumbers=false) -> DataFrame.readJson("", unifyNumbers=false)
  • etc.

Something like that

Jolanrensen avatar Jul 21 '25 10:07 Jolanrensen

Yeah, Spring appreciates a lot of annotations being specific for each situation, great idea

zaleslaw avatar Jul 21 '25 11:07 zaleslaw