Deshan Xiao

Results 17 comments of Deshan Xiao

> Yes, the order is fixed. This is implemented in the `recordPosition` call as below. > > In the `TreeWriterBase.java`, positions of present stream are recorded first. > > https://github.com/apache/orc/blob/792c3f820d0b7a64b27c9dc4c390443386e6baf0/java/core/src/java/org/apache/orc/impl/writer/TreeWriterBase.java#L369-L377...

BTW, Is it necessary for us to add a type list in IndexEntry to describe the type of the position? @wgtmac @dongjoon-hyun @guiyanakuang

> Could you elaborate more about your idea? @deshanxiao ? > > * Do you want to support it only for a specific codec like `brotli` or for all codec?...

Based on the above scenario, in order to avoid some additional side effects, maybe we could skip the limitation by adding a new configuration?

In fact, it is not reasonable to have such large statistics for a single ORC file, it requires too much memory. My suggestion is to limit writing such huge statistics...

@zabetak Could you provide a scenario for a 500GB ORC file? In my experience, columnar storage generally serves big data engines, and each of these big data files is generally...

@zabetak Could you explain in detail why we can write but not read in protobuf layer?