smart-data-lake icon indicating copy to clipboard operation
smart-data-lake copied to clipboard

Speed-up unit tests by custom test data object

Open zzeekk opened this issue 2 years ago • 0 comments

Is your feature request related to a problem? Please describe. Executing all tests takes already about 30mins. We should try to optimize that.

Describe the solution you'd like Much time is taken by preparing input data by writing test data to DataObjects (Csv or Hive). This could be significantly reduced by creating a custom DataObject where a DataFrame can be set as input data, which would be kept in memory to return in getSparkDataFrame method. Setting its DataFrame could happen by implementing writeSparkDataFrame method, which would reduce number of lines of code to change. The same DataObject should support setting the DataFrame multiple times in a test. Bonus: also implement inferring partition values from DataFrame in order to use it in tests using partitioned input data.

zzeekk avatar Apr 11 '22 15:04 zzeekk