data-prep-kit icon indicating copy to clipboard operation
data-prep-kit copied to clipboard

[Feature] Have spark transform testing validate the contents of the output parquet files

Open daw3rd opened this issue 1 year ago • 0 comments

Search before asking

  • [X] I searched the issues and found no similar issues.

Component

Library/core

Feature

Spark is currently writing to files with names that do not correspond with the input names. As a result, we can't compare expected output with test output. Currently AbastactSparkTransformLauncherTest._validate_directory_contents_match() is only validating that the total number of rows match. We should be somehow verifying that the output matches the expected, probably w/o regard for ordering of rows.

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

daw3rd avatar Jun 18 '24 15:06 daw3rd