flytekit icon indicating copy to clipboard operation
flytekit copied to clipboard

[Core feature] Integrate with Soda.io

Open 10sharmashivam opened this issue 1 year ago • 6 comments

Tracking issue

Reference Issue - #2963

Why are the changes needed?

The Soda.io plugin integrates data quality monitoring into Flyte, enabling users to automatically perform data quality checks on their datasets. This integration ensures that data quality is maintained throughout workflows, providing users with timely insights into data issues and enhancing the reliability and trustworthiness of their data-driven processes.

What changes were proposed in this pull request?

1.	Basic Integration with Soda.io: Enables execution of data quality checks using custom configurations.
2.	Configurable Scan Definitions: Users can specify a YAML file for scan definitions, allowing for greater flexibility.
3.	API Key Support: Integrates with Soda Cloud using a Custom API Key per User, enabling secure and personalized data quality scans instead of a single shared API key for centralized usage. i hope I choose correctly(please let me know, if I should choose otherwise)

How was this patch tested?

I tested this basic prototype of the plugin through basic unit tests locally, covering both positive and negative scenarios. The tests ensure that the new functionality works as expected and that any edge cases are handled gracefully.

Additional Comments

I’m submitting this PR as an initial implementation of the Soda.io plugin integration. At this stage, I’ve structured the plugin, created the SodaCheckConfig and SodaCheckTask classes, and registered the plugin with Flyte.

I’d greatly appreciate any feedback or guidance to ensure that I’m on the right track with this implementation. I’ll continue to refine the code and address any changes based on the team’s suggestions and feedback.

Looking forward to your insights!

Check all the applicable boxes

  • [ ] I updated the documentation accordingly.
  • [ ] All new and existing tests passed.
  • [ ] All commits are signed off.

Future Enhancements

Looking ahead, I plan to implement the following enhancements to further improve the plugin:

•	Result Parsing and Reporting: Enhance the formatting of result outputs for better readability and usability.
•	Error Handling: Implement more robust error handling for configuration and API-related issues to improve user experience.
•	Scan Monitoring: Add support for monitoring the status of ongoing scans to provide real-time feedback to users.
•	Scheduling and Triggers: Enable automated scheduling or triggering of scans based on task dependencies within Flyte, enhancing the plugin’s automation capabilities.

Please let me know if I chose the correct approach and should continue with this and enhance this plugin!

10sharmashivam avatar Oct 31 '24 11:10 10sharmashivam

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 40.38%. Comparing base (6bf6f8e) to head (ca275b9). Report is 5 commits behind head on master.

:exclamation: There is a different number of reports uploaded between BASE (6bf6f8e) and HEAD (ca275b9). Click for more details.

HEAD has 3 uploads less than BASE
Flag BASE (6bf6f8e) HEAD (ca275b9)
4 1
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2884       +/-   ##
===========================================
- Coverage   76.79%   40.38%   -36.42%     
===========================================
  Files         196      199        +3     
  Lines       20546    20781      +235     
  Branches     2646     2671       +25     
===========================================
- Hits        15779     8393     -7386     
- Misses       4068    12287     +8219     
+ Partials      699      101      -598     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Oct 31 '24 11:10 codecov[bot]

Hi @kumare3 @wild-endeavor @thomasjpfan @Future-Outlier @eapolinario @pingsutw @samhita-alla @cosmicBboy, I wanted to follow up on this PR, about the basic prototype of this plugin and check if I’m heading in the right direction with the integration.

Also had a few questions in mind,

  1. API Key Approach: I’ve chosen to use a Custom API Key per User for the Soda.io integration, rather than a single shared API key. Please let me know if this approach seems correct or if I should consider an alternative method.
  2. Basic Functionality: I’ve structured the plugin and implemented the SodaCheckConfig and SodaCheckTask classes. Is this the right way to approach the integration, or would you suggest any improvements?
  3. Future Enhancements: I’ve listed potential future enhancements (e.g., result parsing, error handling, scan monitoring). Do you think these are aligned with the direction of Flyte, or should I consider different enhancements?

I look forward to your insights and any guidance on improving this PR. Please let me know if I’m on the right track, or if there are areas I should revisit.

Thanks again!

10sharmashivam avatar Nov 08 '24 21:11 10sharmashivam

@10sharmashivam thank you. Please notice that recently we merged this RFC. It means that community-contributed plugins should live under plugins/community. I'll ask plugin maintainers for a review again

davidmirror-ops avatar Nov 12 '24 16:11 davidmirror-ops

@10sharmashivam thank you. Please notice that recently we merged this RFC. It means that community-contributed plugins should live under plugins/community. I'll ask plugin maintainers for a review again

Hi @davidmirror-ops, thanks for the update! I reviewed the recent RFC regarding community-contributed plugins and the proposed plugins/community structure. Could you confirm if this folder structure is fully implemented and ready for use?

Thanks again for your guidance—I’ll proceed based on your input.

10sharmashivam avatar Nov 12 '24 17:11 10sharmashivam

@10sharmashivam the folder is not created yet, waiting for the first contribution after the RFC :) So feel free to create it. Also, please check the lint errors

davidmirror-ops avatar Nov 12 '24 17:11 davidmirror-ops

@10sharmashivam the folder is not created yet, waiting for the first contribution after the RFC :) So feel free to create it. Also, please check the lint errors

@davidmirror-ops Thank you for the clarification! I’ve set up the plugins/community structure and moved the plugin there, following the RFC as discussed. Also, I’ve addressed the linting issues—everything should be in order now.

Let me know if there’s anything else you’d like me to adjust. Thanks again!

10sharmashivam avatar Nov 12 '24 17:11 10sharmashivam