Danny Chan comments

Results 402 comments of


                                            Danny Chan

[SUPPORT] java.lang.ClassCastException: class org.apache.spark.sql.catalyst.expressions.UnsafeRow cannot be cast to class org.apache.spark.sql.vectorized.ColumnarBatch

I'm pretty sure it is a jar conflict, can you check the jar that involves the reported class?

[SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists.

You may need to tweak the `clean.retain_commits` option.

[SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists.

Is it beause they are being clustered continuously? And do you already skip reading the clustered files?

[SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists.

There are some logs that reports the reader progress in the monitor operator, you can check that to see if the reader lags too much from the producer.

[SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists.

If the job is not executing rollback repetitively, these files should be just a replacing of "COW" of files, for "COW", we create a new base file to replace the...

[SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists.

> clustered still. > And the downstream flink program read these files would met FileNOTEXTIES exception. Either clustering and compaction can be skipped in flink streaming read.

[SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists.

We did have the tests already in the repo for clustering and compaction skipping read, can you ensure the option takes effect and increase the numbers of retained commits before...

[SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists.

> clean.retain_commits was 1 That means each time a new version of file generated, the old one would be deleted, for "COW" table, there is very high possibility you would...

[SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists.

Are you enabling the clustering then? The clustering would rewrite all the partitions. > I think increasing the parameters of retention cleanup will probably generate more files The small files...

[SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists.

This is a replace commit, you can choose to skip it with option `read.skip_clustering` or `read.skip_insertoverride` enabled.