csvkit icon indicating copy to clipboard operation
csvkit copied to clipboard

csvsql: Option to output INSERT statements

Open merkys opened this issue 4 years ago • 7 comments

I want to convert a CSV file to SQL CREATE TABLE and INSERT statements. However, csvsql insists on providing --db together with --insert. This is strange, as csvsql produces CREATE TABLE itself, so it knows the schema and should be able to create INSERT statements without any additional information.

merkys avatar Nov 28 '19 08:11 merkys

csvkit (or agate-sql, on which it depends) doesn't have a facility to output INSERT statements. This would be a new feature. That said, you can insert rows into a database using csvsql, and then dump the database to get the INSERT statements as text.

jpmckinney avatar Nov 29 '19 21:11 jpmckinney

Thanks for prompt response.

That said, you can insert rows into a database using csvsql, and then dump the database to get the INSERT statements as text.

Right, but this requires the 'live' database, when in principle outputting CREATE and INSERT could be done without it. This would be a nice feature.

merkys avatar Dec 02 '19 15:12 merkys

My use case: I have been given an Excel file containing codes to load into a SQL Server database.

I can't query the server directly from my workstation. The server environment does not have Python installed.

I considered generating the SQL statements locally and copying the file to where I can run SQL commands against the server.

As it's not possible, my workaround is to copy the CSV to the database server and load the data into a temporary table using SQL Server Integration Services.


As for a solution, I'm guessing what needs to change is to_sql function.

It looks like there is an opportunity to stringify the insert expression before executing it, or instead of executing.

https://github.com/wireservice/agate-sql/blob/02f46e0cac11a2d35e9d4441b27748b9be95ee9a/agatesql/table.py#L231-L300

def to_sql(self, connection_or_string, table_name, overwrite=False,
           create=True, create_if_not_exists=False, insert=True, prefixes=[],
           db_schema=None, constraints=True, unique_constraint=[], chunk_size=None, 
           min_col_len=1, col_len_multiplier=1):
    [...]

    if insert:
        insert = sql_table.insert()
        for prefix in prefixes:
            insert = insert.prefix_with(prefix)
        if chunk_size is None:
            connection.execute(insert, [dict(zip(self.column_names, row)) for row in self.rows])
        else:
            number_of_rows = len(self.rows)
            for index in range((number_of_rows - 1) // chunk_size + 1):
                end_index = (index + 1) * chunk_size
                if end_index > number_of_rows:
                    end_index = number_of_rows
                connection.execute(insert, [dict(zip(self.column_names, row)) for row in
                                            self.rows[index * chunk_size:end_index]])


    try:
        return sql_table
    finally:
        if engine is not None:
            connection.close()
            engine.dispose()

iainelder avatar May 11 '21 15:05 iainelder

Hmm, this is awkward to implement, because the agate-* plugins are meant to write files (e.g. XLSX), return agate tables, return SQLAlchemy tables, etc.

To write this output, we could just print() to standard output from within agate-sql, but that seems like a bad API choice.

We could add a method to agate-sql that is similar to to_sql (we can maybe extract common logic to a third method), and this method would yield the Insert constructs to csvkit, which could then print them. That's probably the cleanest way, if anyone whats to write a PR.

jpmckinney avatar Oct 17 '23 22:10 jpmckinney

agate-* plugins are meant to write files

Could it write a file called insert_statements.sql that contains the desired output?

I haven't used csvsql since I worked around the problem in 2011, so I won't write the PR, but I look forward to someone who can :-)

iainelder avatar Oct 18 '23 04:10 iainelder

If yielding the Insert constructs from agate-sql is too much trouble, that could be another option.

jpmckinney avatar Oct 18 '23 19:10 jpmckinney

+1 for this feature

PRNDA avatar Mar 28 '24 07:03 PRNDA