smart-data-lake
smart-data-lake copied to clipboard
Speed-up unit tests by custom test data object
Is your feature request related to a problem? Please describe. Executing all tests takes already about 30mins. We should try to optimize that.
Describe the solution you'd like Much time is taken by preparing input data by writing test data to DataObjects (Csv or Hive). This could be significantly reduced by creating a custom DataObject where a DataFrame can be set as input data, which would be kept in memory to return in getSparkDataFrame method. Setting its DataFrame could happen by implementing writeSparkDataFrame method, which would reduce number of lines of code to change. The same DataObject should support setting the DataFrame multiple times in a test. Bonus: also implement inferring partition values from DataFrame in order to use it in tests using partitioned input data.