buildflow icon indicating copy to clipboard operation
buildflow copied to clipboard

Add support for writing additional file types beyond parquet

Open boetro opened this issue 2 years ago • 8 comments

Some more file types: JSON, Avro, CSV

boetro avatar Mar 16 '23 15:03 boetro

Can you point to where I should start looking to explore this issue?

aeroaks avatar May 04 '23 11:05 aeroaks

Definitely!

All the file io is in this file: https://github.com/launchflow/buildflow/blob/main/buildflow/runtime/ray_io/file_io.py

The enum here has the list of file types we support: https://github.com/launchflow/buildflow/blob/main/buildflow/runtime/ray_io/file_io.py#L18

And in the _write method defines each file format is written: https://github.com/launchflow/buildflow/blob/main/buildflow/runtime/ray_io/file_io.py#L50

You should be able to mimic the tests here with any additional file types also: https://github.com/launchflow/buildflow/blob/main/buildflow/runtime/ray_io/file_io_test.py

If you have any other questions let me know!

boetro avatar May 04 '23 11:05 boetro

Thanks, I an trying to test my changes and found that the existing tests are using unittest module. Is their any specific reason to use this in place of pytest? And also, are there any specific commands to run all the existing tests ?

aeroaks avatar May 08 '23 08:05 aeroaks

We mostly just did that from preference. Previous it was nice to use to setup ray specifics in the setUpClass methods, but we moved that to a pytest fixture, but it should play well with pytest I believe.

My typical workflow for a fresh install is (running from the root directory):

# install all dev deps
pip install .[dev]
pytest

boetro avatar May 08 '23 11:05 boetro

Thanks, I got the tests working. I have send a pull request #137 . But the checks are not running. And I was not able to add reviewer in there.

aeroaks avatar May 10 '23 08:05 aeroaks

Hmm interesting probably an issue with our permission set up. Let me try and update those

boetro avatar May 10 '23 11:05 boetro

Alright turns out it was an issue with when the workflow runs. I updated it in https://github.com/launchflow/buildflow/pull/138 to run on PRs. So I think if you pull down the latest changes it should work.

(also added a CODEOWNERS to auto assign reviews)

boetro avatar May 10 '23 12:05 boetro

With #142 , filesink should now supports CSV and JSON

aeroaks avatar May 12 '23 09:05 aeroaks