kedro-plugins
kedro-plugins copied to clipboard
feat(datasets): Add CSVDataset to dask module
Description
To add the capability for CSV datasets to be used with Kedro.
Development notes
Copied the Parquet dataset implementation for Dask and reimplemented for CSV files. Has been tested in use of Kedro pipelines, but no test classes written yet.
Checklist
- [X] Opened this PR as a 'Draft Pull Request' if it is work-in-progress
- [x] Updated the documentation to reflect the code changes
- [x] Added a description of this change in the relevant
RELEASE.md
file - [x] Added tests to cover my changes
Thank you for this PR @michaelsexton ! I see the PR is in draft, do you need any input from us or not yet?
Hi @astrojuanlu I believe I have completed all the tasks necessary in draft, however the unit tests in the checks keep failing. I am getting an AssertionError
in the unit-tests (ubuntu-latest, 3.10) / unit-tests check, however I can't replicate that locally.
@michaelsexton I've fixed several issues that weren't related to your changes, but it looks like the tests you added aren't passing currently. I can replicate this locally as well. Do you have some time to fix them?
@merelcht Thanks, I should have some time later this week. Is it OK to discuss with you why they might failing if I continue running into them? Hopefully I'll get them all to pass anyway.
@merelcht Thanks, I should have some time later this week. Is it OK to discuss with you why they might failing if I continue running into them? Hopefully I'll get them all to pass anyway.
Absolutely! I'm happy to help get this all working.
Hi @michaelsexton just checking in to see if you're still interested in finishing this PR?
Fixed the unit tests + linting!
Made some small changes to the _exists()
fn. I'll let other people from the team review this!
We got 10 approvals already 💯