bcpandas icon indicating copy to clipboard operation
bcpandas copied to clipboard

Improvement: Identifying Collation for format file

Open tylerwmarrs opened this issue 4 years ago • 2 comments

You can use the following SQL to identify collation information for a database, column, etc. This could be important to those of us coming from a Unix background and turn off the default case insensitive collations. :)

COLLATION_SQL = """
DECLARE @DefaultCollation varchar(256);
DECLARE @Database varchar(1000);
DECLARE @Table varchar(1000);
SET @Database = '{}';
SET @Table = '{}';
SELECT @DefaultCollation = CONVERT (varchar(256), DATABASEPROPERTYEX(@Database, 'collation'));

SELECT t.name as tableName,
c.name as columnName,
IIF(collation_name is NULL, @DefaultCollation, collation_name) as collationName
FROM [{}].sys.columns c
INNER JOIN [{}].sys.tables t
ON c.object_id = t.object_id
AND t.name = @Table;
"""

Reference: https://docs.microsoft.com/en-us/sql/relational-databases/collations/view-collation-information?view=sql-server-ver15

I tried to use this tool as is and ended up implementing my own version of it to have more control over how the data is handled within an Airflow operator. I'll try to suggest improvements and potentially submit PRs. My time is pretty limited for open source contributions lately.

tylerwmarrs avatar May 05 '21 21:05 tylerwmarrs

ok. not too familiar with this, and tied up as well, but feel free to submit a PR when you are able to

yehoshuadimarsky avatar May 07 '21 16:05 yehoshuadimarsky

Related to this is that because the library is not collation-aware, difference in case between the dataframe column names and the database's table column names result in errors in _handle_cols_for_append() and build_format_file(), even if table's column's collation is case-insensitive

ispedals avatar Nov 01 '21 14:11 ispedals