deno_std
deno_std copied to clipboard
feat(encoding/csv): restructure proposal
Is your feature request related to a problem? Please describe.
The encoding/csv API is scattered and messy over multiple files and a csv subdir, some files are browser compatible, some are not.
Describe the solution you'd like This proposal includes a restructure and changes as following:
- [x] make
parsebrowser compatible and sync and therefore removeBufReaderas input (https://github.com/denoland/deno_std/pull/2491) - [x] combine csv.ts and csv_stringify.ts as well as csv_test.ts and csv_stringify_test.ts (https://github.com/denoland/deno_std/pull/2524)
- [x] remove
ColumnOptions(https://github.com/denoland/deno_std/pull/2536) - [x] make
stringifysync (https://github.com/denoland/deno_std/pull/2611) - [ ] deprecate obsolete exports and remove them after 2 or so releases (https://github.com/denoland/deno_std/pull/2633)
- [ ] cleanup
I think BufReader input of csv.parse can be deprecated or removed now in favor of stream API.
@crowlKats What do you think?
i'd say deprecating it would be ok, but not removing it; most people havent switched to using the web streams based APIs
i'd say deprecating it would be ok, but not removing it; most people havent switched to using the web streams based APIs
Sounds reasonable to me. Maybe let's first deprecate BufReader for a while and then start the above sequence of changes?
sounds good
Working on the restructure:
Do we need ColumnOptions for the columns option next to string[]? I don't really see the point, in which cases would this be used?
https://github.com/denoland/deno_std/blob/f415da4c62cdacce880c3c7e492ceee184c3be4a/encoding/csv.ts#L148-L153
ColumnOptions used to have parse property, but that was removed in https://github.com/denoland/deno_std/pull/1724. Looks like they are now useless. I think it's ok to remove it from the input typing.
@kt3k Do we really need this column.fn as part of the api? https://github.com/denoland/deno_std/blob/6bd0f514b63c68c57b1625aa38f015d6b2a02676/encoding/csv.ts#L63
One can always loop over the values and manipulate them before the stringification process.
If we remove it, stringify can be sync as parse is now.
I think that your suggestion makes a lot of sense @timreichen
The transformation process always can be done before the stringification process
Below are some additional refactoring suggestions from me.
-
There are currently three optional type definitions for parsers (
ReadOptions,ParseOptions,CsvStreamOptions), but I think they can be simplified to one.-
trimLeadingSpace/lazyQuotes: As far as I read the implementation, it looks like it's already available forCsvStream. It should be added to the type definition ofCsvStreamOption. -
skipFirstRow/columns: Added toCsvStreamin #3184. -
fieldsPerRecord: It doesn't seem to be available forCsvStreamyet. Want to add?// _io.ts interface ReadOptions { separator?: string; comment?: string; trimLeadingSpace?: boolean; lazyQuotes?: boolean; fieldsPerRecord?: number; } // csv.ts interface ParseOptions extends ReadOptions { skipFirstRow?: boolean; columns?: string[]; } // stream.ts interface CsvStreamOptions { separator?: string; comment?: string; }
-
-
How about clarifying the file name?
_parser.tsseems to be mainly used incsv.ts. Would you like to change it to_sync_parser.ts?_io.tsseems to be mainly used instream.ts. Would you like to change it to_streaming_parser.ts?
-
StreamLineReaderwas probably added during the transition from Go-based streams, but is now just a thin wrapper around TextLineStream. Why not remove it and change it to use TextLineStream directly inside? -
It seems that
Parser.prorotype.#parseRecordandparseRecordfunction have a lot in common. Why not share these implementations? -
readRecordin_io.tsdoesn't seem to be used at all. Should it be removed? -
csv_test.ts has two identical test cases. I think we can probably organize these better. https://github.com/denoland/deno_std/blob/3e9b8943fdd3a0729f1e34e8668eec56f991c81d/encoding/csv_test.ts#L693-L708
-
I think the
csv_test.tsandstream_test.tstest cases can be combined into one. How about putting egtestcase1.csvandtestcase1_output.jsonin the testdata directory? -
How about changing
Record<string, unknown>[]toRecord<string, string | undefined>[]in return type ofparsefunction?
I'd like to close this issue as most of the tasks have been completed. @timreichen, does that sound good to you?
I've created #3765 which we can use to guide further work on making std/csv better. @ayame113, would you like to review, refine and copy your suggestions to that issue for further discussion?