data-prep-kit
data-prep-kit copied to clipboard
[Feature] Have spark transform testing validate the contents of the output parquet files
Search before asking
- [X] I searched the issues and found no similar issues.
Component
Library/core
Feature
Spark is currently writing to files with names that do not correspond with the input names. As a result, we can't compare expected output with test output. Currently AbastactSparkTransformLauncherTest._validate_directory_contents_match() is only validating that the total number of rows match. We should be somehow verifying that the output matches the expected, probably w/o regard for ordering of rows.
Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!