ksql icon indicating copy to clipboard operation
ksql copied to clipboard

KSQL forces Avro field names to upper case

Open rmoff opened this issue 6 years ago • 22 comments

Given an Avro schema in which the field name is mixed case:

$ curl -s "http://localhost:8081/subjects/AVRO_WITH_MIXED_CASE_FIELDS-value/versions/1"|jq '.schema|fromjson'

{
  "type": "record",
  "name": "KsqlDataSourceSchema",
  "namespace": "io.confluent.ksql.avro_schemas",
  "fields": [
    {
      "name": "FooBar",
      "type": [
        "null",
        "string"
      ],
      "default": null
    }
  ]
}

KSQL reads the field as mixed case:

ksql> print 'AVRO_WITH_MIXED_CASE_FIELDS' from beginning;
Format:AVRO
06/02/19 09:34:58 GMT, null, {"FooBar": "FOO"}

But when registered as a stream, KSQL forces the field name to upper case:

ksql> CREATE STREAM TEST WITH (VALUE_FORMAT='AVRO', KAFKA_TOPIC='AVRO_WITH_MIXED_CASE_FIELDS');

 Message
----------------
 Stream created
----------------
ksql> DESCRIBE TEST;

Name                 : TEST
 Field   | Type
-------------------------------------
 ROWTIME | BIGINT           (system)
 ROWKEY  | VARCHAR(STRING)  (system)
 FOOBAR  | VARCHAR(STRING)
-------------------------------------
For runtime statistics and query details run: DESCRIBE EXTENDED <Stream,Table>;

rmoff avatar Feb 06 '19 14:02 rmoff

JSON is affected in the same way. Makes using the streams API on topics fed by KSQL special cases that require extra work to use in the same way as other topics.

j-halbert avatar Aug 28 '19 17:08 j-halbert

I think this is expected default behavior, but a workaround is provided. In general, we assume all fields are case insensitive (e.g. if my avro schema is what you have above, I should be able to SELECT FOOBAR FROM ... and get data) unless otherwise specified. If you want to maintain the case sensitivity, then you can explicitly specify the schema (as long as it does not clash with the avro schema in the <subject>-value in Schema Registry:

CREATE STREAM foobar (`FooBar` VARCHAR) WITH (...);

Perhaps it would be valuable to provide a shortcut in the WITH clause to treat all avro fields as case-sensitive...

agavra avatar Oct 28 '19 21:10 agavra

I am trying to use KSQL to filter messages from a debezium SQL Server source that uses a case-sensitive collation. KSQL makes this filtering very easy, but I am currently unable to use it due to this issue. The SQL statements I generate at my sink fail because the casing of (for example) column names does not match what is in the database.

IMO, the default behavior should be to leave the schema alone.

tmbull avatar Dec 05 '20 22:12 tmbull

Perhaps it would be valuable to provide a shortcut in the WITH clause to treat all avro fields as case-sensitive...

Please this is very needed :smile:

saadshahd avatar Mar 18 '21 17:03 saadshahd

We have a similar use case where we want to retain the case when filtering the messages from a topic using ksqldb. Can someone please tell me if this change can be expected in the near future?

greendad avatar Apr 24 '21 03:04 greendad

I don't know why ksql decided to uppercase data when kafka is not doing so. Could we have ksql just leave the data just as the schema in the schema registry or as it receives it from the source kafka topic? This makes working with ksql a pain especially when you have large data set.

emerzonic avatar May 03 '21 16:05 emerzonic

I also find this a problem. all fields are automatically uppercased, makes working with ksql along with kafka connect sinks impossible.

to add matters worse, even if aliasing fields with quoted lower case names works, this is extra difficult and annoying when the column is nested. so you cant alias just the top field name.

Any news on a fix in the near future?

dimagoldin avatar Jun 10 '21 08:06 dimagoldin

Any movement on this ?

ratskates avatar Jun 18 '21 17:06 ratskates

PLEASE RELEASE A NEW VERSION FOR KSQL WITH A RESOLUTION TO THIS. Create stream from a topic with a registered avro schema using ksql -> ksql must respect the schema and not force the field names to uppercase.

akotb89 avatar Aug 16 '21 19:08 akotb89

This is a very useful feature.

ReasonDuan avatar Sep 02 '21 09:09 ReasonDuan

This is a desirable feature for our company to use Confluent Cloud ksqlDB to process Kafka topic data and feed data back to Kafka.

ethanl-indeed avatar Sep 10 '21 22:09 ethanl-indeed

+1 We want to use the JDBC Sink Connector but the fields in ksql are uppercase while the fields in our db are lowercase.

sscots avatar Sep 23 '21 13:09 sscots

Is there going to be any movement on this anytime soon? we are running into issues with this as well

spancespants avatar Nov 01 '21 17:11 spancespants

KLIP-56 will help with this issue and improve the situation.

We are still considering to give users even more control over the behavior (based on user demand) by adding a new property that allows to enable/disable upper-casing the names.

mjsax avatar Nov 03 '21 17:11 mjsax

Same problem here. Do we have any news about that ?

jchambondynadmic avatar Dec 01 '22 12:12 jchambondynadmic

Still see this issue in recent versions. it's a pain.

ghost avatar Apr 04 '23 12:04 ghost

Yeah agree. We're waiting for it since a long time !

jchambondynadmic avatar Apr 04 '23 12:04 jchambondynadmic

I went for some further investigation. This feature got implemented with https://docs.ksqldb.io/en/latest/operate-and-deploy/schema-inference-with-id/ . Only issue is that you have to use the schema id of the schema and it cannot be inheritated by the kafka topic name.

ghost avatar Apr 04 '23 15:04 ghost

It still forces uppercase. Which is the purpose of this issue.

jchambondynadmic avatar Apr 14 '23 08:04 jchambondynadmic

Any updates on this issue? Is there any workaround if using schema-registry?

guilhermeneves avatar Nov 14 '23 16:11 guilhermeneves

Any updates on this issue?

fapinheiro avatar Feb 02 '24 21:02 fapinheiro

I am not even using schema registry and still see column names forced to upper case. I wish there was a global setting right at start-up to control this e.g., PRESERVE_COLUMN_CASE=true or false to force upper case.

chainhead avatar Jul 28 '24 05:07 chainhead