gluesql icon indicating copy to clipboard operation
gluesql copied to clipboard

[DRAFT] Add CSV storage

Open MRGRAVITY817 opened this issue 2 years ago • 2 comments

Resolves #532

This new storage implementation will have the following functionalities:

  • [ ] Read CSV file from the path like reading a table.
  • [ ] Check if the CSV file has an interpretable schema using {table_name}.schema.sql file.
  • [ ] Mutate the original CSV file, or make a copy of mutated result.
  • [ ] [...will add more]

WIP 🚧

MRGRAVITY817 avatar Jul 10 '22 09:07 MRGRAVITY817

Codecov Report

Merging #629 (a42fd2f) into main (fab6e71) will decrease coverage by 0.23%. The diff coverage is 2.32%.

@@            Coverage Diff             @@
##             main     #629      +/-   ##
==========================================
- Coverage   94.33%   94.09%   -0.24%     
==========================================
  Files         217      219       +2     
  Lines       16469    16524      +55     
==========================================
+ Hits        15536    15549      +13     
- Misses        933      975      +42     
Impacted Files Coverage Δ
storages/csv-storage/src/lib.rs 0.00% <0.00%> (ø)
storages/csv-storage/src/main.rs 50.00% <50.00%> (ø)
core/src/ast_builder/expr/function.rs 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update fab6e71...a42fd2f. Read the comment docs.

codecov-commenter avatar Jul 10 '22 09:07 codecov-commenter

Pull Request Test Coverage Report for Build 3669001041

  • 517 of 618 (83.66%) changed or added relevant lines in 6 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.2%) to 98.35%

Changes Missing Coverage Covered Lines Changed/Added Lines %
storages/csv-storage/src/lib.rs 1 6 16.67%
storages/csv-storage/src/error.rs 0 6 0.0%
storages/csv-storage/src/store_mut.rs 0 32 0.0%
storages/csv-storage/src/store.rs 0 58 0.0%
<!-- Total: 517 618
Totals Coverage Status
Change from base Build 3662721935: -0.2%
Covered Lines: 37843
Relevant Lines: 38478

💛 - Coveralls

coveralls avatar Jul 23 '22 06:07 coveralls

@panarch

Initial structure that I've been thinking for schema.toml

[table.<table_name>]
path = "<path_to_csv_file>"
# for column with type (default is nullable)
columns.column_name_1 = "<data_type>"
# for column with options
columns.column_name_2 = { type = "<data_type>", options = ["<column_option>"] }

Example

[table.users]
path = "example/data/users.csv"
columns.id = {type = "Int128", options = ["PKey"]}
columns.name = { type = "Text", options = ["NotNull"] }
columns.age = "Uint8"
columns.role = { type = "Text", options = ["NotNull"] }

[table.orders]
path = "example/data/orders.csv"
columns.id = "Int128"
columns.name = "Text"
columns.orderer_id = "Int128"

MRGRAVITY817 avatar Oct 26 '22 11:10 MRGRAVITY817

Or maybe this one is better, at least more easy to parse using toml crate.

[[tables]]
name = "users"
path = "example/data/users.csv"
columns = [
	{ name = "id",   type = "Int128", options = ["PKey"]    },
	{ name = "name", type = "Text",   options = ["NotNull"] },
	{ name = "age",  type = "Unit8"                         },
	{ name = "role", type = "Text",   options = ["NotNull"] },
]

MRGRAVITY817 avatar Nov 02 '22 05:11 MRGRAVITY817