ksql
ksql copied to clipboard
KSQL forces Avro field names to upper case
Given an Avro schema in which the field name is mixed case:
$ curl -s "http://localhost:8081/subjects/AVRO_WITH_MIXED_CASE_FIELDS-value/versions/1"|jq '.schema|fromjson'
{
"type": "record",
"name": "KsqlDataSourceSchema",
"namespace": "io.confluent.ksql.avro_schemas",
"fields": [
{
"name": "FooBar",
"type": [
"null",
"string"
],
"default": null
}
]
}
KSQL reads the field as mixed case:
ksql> print 'AVRO_WITH_MIXED_CASE_FIELDS' from beginning;
Format:AVRO
06/02/19 09:34:58 GMT, null, {"FooBar": "FOO"}
But when registered as a stream, KSQL forces the field name to upper case:
ksql> CREATE STREAM TEST WITH (VALUE_FORMAT='AVRO', KAFKA_TOPIC='AVRO_WITH_MIXED_CASE_FIELDS');
Message
----------------
Stream created
----------------
ksql> DESCRIBE TEST;
Name : TEST
Field | Type
-------------------------------------
ROWTIME | BIGINT (system)
ROWKEY | VARCHAR(STRING) (system)
FOOBAR | VARCHAR(STRING)
-------------------------------------
For runtime statistics and query details run: DESCRIBE EXTENDED <Stream,Table>;
JSON is affected in the same way. Makes using the streams API on topics fed by KSQL special cases that require extra work to use in the same way as other topics.
I think this is expected default behavior, but a workaround is provided. In general, we assume all fields are case insensitive (e.g. if my avro schema is what you have above, I should be able to SELECT FOOBAR FROM ... and get data) unless otherwise specified. If you want to maintain the case sensitivity, then you can explicitly specify the schema (as long as it does not clash with the avro schema in the <subject>-value in Schema Registry:
CREATE STREAM foobar (`FooBar` VARCHAR) WITH (...);
Perhaps it would be valuable to provide a shortcut in the WITH clause to treat all avro fields as case-sensitive...
I am trying to use KSQL to filter messages from a debezium SQL Server source that uses a case-sensitive collation. KSQL makes this filtering very easy, but I am currently unable to use it due to this issue. The SQL statements I generate at my sink fail because the casing of (for example) column names does not match what is in the database.
IMO, the default behavior should be to leave the schema alone.
Perhaps it would be valuable to provide a shortcut in the WITH clause to treat all avro fields as case-sensitive...
Please this is very needed :smile:
We have a similar use case where we want to retain the case when filtering the messages from a topic using ksqldb. Can someone please tell me if this change can be expected in the near future?
I don't know why ksql decided to uppercase data when kafka is not doing so. Could we have ksql just leave the data just as the schema in the schema registry or as it receives it from the source kafka topic? This makes working with ksql a pain especially when you have large data set.
I also find this a problem. all fields are automatically uppercased, makes working with ksql along with kafka connect sinks impossible.
to add matters worse, even if aliasing fields with quoted lower case names works, this is extra difficult and annoying when the column is nested. so you cant alias just the top field name.
Any news on a fix in the near future?
Any movement on this ?
PLEASE RELEASE A NEW VERSION FOR KSQL WITH A RESOLUTION TO THIS. Create stream from a topic with a registered avro schema using ksql -> ksql must respect the schema and not force the field names to uppercase.
This is a very useful feature.
This is a desirable feature for our company to use Confluent Cloud ksqlDB to process Kafka topic data and feed data back to Kafka.
+1 We want to use the JDBC Sink Connector but the fields in ksql are uppercase while the fields in our db are lowercase.
Is there going to be any movement on this anytime soon? we are running into issues with this as well
KLIP-56 will help with this issue and improve the situation.
We are still considering to give users even more control over the behavior (based on user demand) by adding a new property that allows to enable/disable upper-casing the names.
Same problem here. Do we have any news about that ?
Still see this issue in recent versions. it's a pain.
Yeah agree. We're waiting for it since a long time !
I went for some further investigation. This feature got implemented with https://docs.ksqldb.io/en/latest/operate-and-deploy/schema-inference-with-id/ . Only issue is that you have to use the schema id of the schema and it cannot be inheritated by the kafka topic name.
It still forces uppercase. Which is the purpose of this issue.
Any updates on this issue? Is there any workaround if using schema-registry?
Any updates on this issue?
I am not even using schema registry and still see column names forced to upper case. I wish there was a global setting right at start-up to control this e.g., PRESERVE_COLUMN_CASE=true or false to force upper case.