artefactory-connectors-kit icon indicating copy to clipboard operation
artefactory-connectors-kit copied to clipboard

As a User, I can stream reports in a .csv format

Open gabrielleberanger opened this issue 4 years ago • 1 comments

WHY Today, the only output stream format available is .njson (i.e. a file with n lines, each line being a dictionnary). This format has two downsides:

  • It does not allow us to easily conduct preliminary analysis on the output data: .njson files cannot be directly forwarded to non-tech users, and cannot be put into a pandas DataFrame without undergoing preliminary transformations.
  • Some APIs natively return data in a .csv format: in these cases, we have to convert each line to a dictionnary, which can occasion parsing errors.

HOW Create a .csv streamer.

gabrielleberanger avatar Dec 16 '20 14:12 gabrielleberanger

Hi there,

I've started working on this issue and I've noticed that we may encounter a problem with the current software architecture.

Currently, the format of the destination file is enforced. We will have a .njson file by default. Even though there is a Pickle option, it is never used in the code. If we want to introduce a new format like CSV, we must let users decide which format they prefer. It would be intuitive to have an option in the writer command, something like write_gcs --gcs-file-format csv.

BUT, to do so, we need to change the stream we use (CSVStream vs JSONStream) and this choice must be implemented in the read() function in the reader. So, that would force us to add the file format as an option of the reader, something like read_dv360 --dv360-file-format csv, which is not as intuitive as if it was in the writer options because we now mix up the reader and writer options.

Is it acceptable though?

What is your opinion regarding this issue?

benoitgoujon avatar Jan 08 '21 19:01 benoitgoujon