Add support for writing additional file types beyond parquet
Some more file types: JSON, Avro, CSV
Can you point to where I should start looking to explore this issue?
Definitely!
All the file io is in this file: https://github.com/launchflow/buildflow/blob/main/buildflow/runtime/ray_io/file_io.py
The enum here has the list of file types we support: https://github.com/launchflow/buildflow/blob/main/buildflow/runtime/ray_io/file_io.py#L18
And in the _write method defines each file format is written: https://github.com/launchflow/buildflow/blob/main/buildflow/runtime/ray_io/file_io.py#L50
You should be able to mimic the tests here with any additional file types also: https://github.com/launchflow/buildflow/blob/main/buildflow/runtime/ray_io/file_io_test.py
If you have any other questions let me know!
Thanks, I an trying to test my changes and found that the existing tests are using unittest module. Is their any specific reason to use this in place of pytest?
And also, are there any specific commands to run all the existing tests ?
We mostly just did that from preference. Previous it was nice to use to setup ray specifics in the setUpClass methods, but we moved that to a pytest fixture, but it should play well with pytest I believe.
My typical workflow for a fresh install is (running from the root directory):
# install all dev deps
pip install .[dev]
pytest
Thanks, I got the tests working. I have send a pull request #137 . But the checks are not running. And I was not able to add reviewer in there.
Hmm interesting probably an issue with our permission set up. Let me try and update those
Alright turns out it was an issue with when the workflow runs. I updated it in https://github.com/launchflow/buildflow/pull/138 to run on PRs. So I think if you pull down the latest changes it should work.
(also added a CODEOWNERS to auto assign reviews)
With #142 , filesink should now supports CSV and JSON