deno_std icon indicating copy to clipboard operation
deno_std copied to clipboard

feat(encoding/csv): restructure proposal

Open timreichen opened this issue 3 years ago • 8 comments

Is your feature request related to a problem? Please describe. The encoding/csv API is scattered and messy over multiple files and a csv subdir, some files are browser compatible, some are not.

Describe the solution you'd like This proposal includes a restructure and changes as following:

  • [x] make parse browser compatible and sync and therefore remove BufReader as input (https://github.com/denoland/deno_std/pull/2491)
  • [x] combine csv.ts and csv_stringify.ts as well as csv_test.ts and csv_stringify_test.ts (https://github.com/denoland/deno_std/pull/2524)
  • [x] remove ColumnOptions (https://github.com/denoland/deno_std/pull/2536)
  • [x] make stringify sync (https://github.com/denoland/deno_std/pull/2611)
  • [ ] deprecate obsolete exports and remove them after 2 or so releases (https://github.com/denoland/deno_std/pull/2633)
  • [ ] cleanup

timreichen avatar May 31 '22 12:05 timreichen

I think BufReader input of csv.parse can be deprecated or removed now in favor of stream API.

@crowlKats What do you think?

kt3k avatar May 31 '22 12:05 kt3k

i'd say deprecating it would be ok, but not removing it; most people havent switched to using the web streams based APIs

crowlKats avatar May 31 '22 14:05 crowlKats

i'd say deprecating it would be ok, but not removing it; most people havent switched to using the web streams based APIs

Sounds reasonable to me. Maybe let's first deprecate BufReader for a while and then start the above sequence of changes?

kt3k avatar May 31 '22 15:05 kt3k

sounds good

crowlKats avatar May 31 '22 15:05 crowlKats

Working on the restructure:

Do we need ColumnOptions for the columns option next to string[]? I don't really see the point, in which cases would this be used? https://github.com/denoland/deno_std/blob/f415da4c62cdacce880c3c7e492ceee184c3be4a/encoding/csv.ts#L148-L153

timreichen avatar Jun 05 '22 19:06 timreichen

ColumnOptions used to have parse property, but that was removed in https://github.com/denoland/deno_std/pull/1724. Looks like they are now useless. I think it's ok to remove it from the input typing.

kt3k avatar Jun 06 '22 04:06 kt3k

@kt3k Do we really need this column.fn as part of the api? https://github.com/denoland/deno_std/blob/6bd0f514b63c68c57b1625aa38f015d6b2a02676/encoding/csv.ts#L63 One can always loop over the values and manipulate them before the stringification process. If we remove it, stringify can be sync as parse is now.

timreichen avatar Aug 19 '22 08:08 timreichen

I think that your suggestion makes a lot of sense @timreichen

The transformation process always can be done before the stringification process

luk3skyw4lker avatar Sep 02 '22 22:09 luk3skyw4lker

Below are some additional refactoring suggestions from me.

  • There are currently three optional type definitions for parsers (ReadOptions, ParseOptions, CsvStreamOptions), but I think they can be simplified to one.

    • trimLeadingSpace / lazyQuotes: As far as I read the implementation, it looks like it's already available for CsvStream. It should be added to the type definition of CsvStreamOption.

    • skipFirstRow / columns: Added to CsvStream in #3184.

    • fieldsPerRecord: It doesn't seem to be available for CsvStream yet. Want to add?

      // _io.ts
      interface ReadOptions {
        separator?: string;
        comment?: string;
        trimLeadingSpace?: boolean;
        lazyQuotes?: boolean;
        fieldsPerRecord?: number;
      }
      // csv.ts
      interface ParseOptions extends ReadOptions {
        skipFirstRow?: boolean;
        columns?: string[];
      }
      // stream.ts
      interface CsvStreamOptions {
        separator?: string;
        comment?: string;
      }
      
  • How about clarifying the file name?

    • _parser.ts seems to be mainly used in csv.ts. Would you like to change it to _sync_parser.ts?
    • _io.ts seems to be mainly used in stream.ts. Would you like to change it to _streaming_parser.ts?
  • StreamLineReader was probably added during the transition from Go-based streams, but is now just a thin wrapper around TextLineStream. Why not remove it and change it to use TextLineStream directly inside?

  • It seems that Parser.prorotype.#parseRecord and parseRecord function have a lot in common. Why not share these implementations?

  • readRecord in _io.ts doesn't seem to be used at all. Should it be removed?

  • csv_test.ts has two identical test cases. I think we can probably organize these better. https://github.com/denoland/deno_std/blob/3e9b8943fdd3a0729f1e34e8668eec56f991c81d/encoding/csv_test.ts#L693-L708

  • I think the csv_test.ts and stream_test.ts test cases can be combined into one. How about putting eg testcase1.csv and testcase1_output.json in the testdata directory?

  • How about changing Record<string, unknown>[] to Record<string, string | undefined>[] in return type of parse function?

ayame113 avatar Feb 10 '23 14:02 ayame113

I'd like to close this issue as most of the tasks have been completed. @timreichen, does that sound good to you?

I've created #3765 which we can use to guide further work on making std/csv better. @ayame113, would you like to review, refine and copy your suggestions to that issue for further discussion?

iuioiua avatar Dec 08 '23 06:12 iuioiua