csvkit
csvkit copied to clipboard
csvsql: Option to output INSERT statements
I want to convert a CSV file to SQL CREATE TABLE
and INSERT
statements. However, csvsql
insists on providing --db
together with --insert
. This is strange, as csvsql
produces CREATE TABLE
itself, so it knows the schema and should be able to create INSERT
statements without any additional information.
csvkit (or agate-sql, on which it depends) doesn't have a facility to output INSERT statements. This would be a new feature. That said, you can insert rows into a database using csvsql, and then dump the database to get the INSERT statements as text.
Thanks for prompt response.
That said, you can insert rows into a database using csvsql, and then dump the database to get the INSERT statements as text.
Right, but this requires the 'live' database, when in principle outputting CREATE
and INSERT
could be done without it. This would be a nice feature.
My use case: I have been given an Excel file containing codes to load into a SQL Server database.
I can't query the server directly from my workstation. The server environment does not have Python installed.
I considered generating the SQL statements locally and copying the file to where I can run SQL commands against the server.
As it's not possible, my workaround is to copy the CSV to the database server and load the data into a temporary table using SQL Server Integration Services.
As for a solution, I'm guessing what needs to change is to_sql function.
It looks like there is an opportunity to stringify the insert expression before executing it, or instead of executing.
https://github.com/wireservice/agate-sql/blob/02f46e0cac11a2d35e9d4441b27748b9be95ee9a/agatesql/table.py#L231-L300
def to_sql(self, connection_or_string, table_name, overwrite=False,
create=True, create_if_not_exists=False, insert=True, prefixes=[],
db_schema=None, constraints=True, unique_constraint=[], chunk_size=None,
min_col_len=1, col_len_multiplier=1):
[...]
if insert:
insert = sql_table.insert()
for prefix in prefixes:
insert = insert.prefix_with(prefix)
if chunk_size is None:
connection.execute(insert, [dict(zip(self.column_names, row)) for row in self.rows])
else:
number_of_rows = len(self.rows)
for index in range((number_of_rows - 1) // chunk_size + 1):
end_index = (index + 1) * chunk_size
if end_index > number_of_rows:
end_index = number_of_rows
connection.execute(insert, [dict(zip(self.column_names, row)) for row in
self.rows[index * chunk_size:end_index]])
try:
return sql_table
finally:
if engine is not None:
connection.close()
engine.dispose()
Hmm, this is awkward to implement, because the agate-* plugins are meant to write files (e.g. XLSX), return agate tables, return SQLAlchemy tables, etc.
To write this output, we could just print()
to standard output from within agate-sql, but that seems like a bad API choice.
We could add a method to agate-sql that is similar to to_sql
(we can maybe extract common logic to a third method), and this method would yield the Insert constructs to csvkit, which could then print them. That's probably the cleanest way, if anyone whats to write a PR.
agate-* plugins are meant to write files
Could it write a file called insert_statements.sql
that contains the desired output?
I haven't used csvsql since I worked around the problem in 2011, so I won't write the PR, but I look forward to someone who can :-)
If yielding the Insert constructs from agate-sql is too much trouble, that could be another option.
+1 for this feature