arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[C++][Parquet] Separate encoders and decoder

Open pitrou opened this issue 1 year ago • 8 comments
trafficstars

Describe the enhancement requested

Currently, encoders and decoders are defined in a single file encoding.cc, which is quite large.

Given that their infrastructure is separate, it would probably make maintenance easier to split them into two C++ source files (for example encoder.cc and decoder.cc). We can add corresponding .h files, and also keep encoding.h for compatibility.

Component(s)

C++, Parquet

pitrou avatar Feb 20 '24 12:02 pitrou

Thoughts @felipecrv @mapleFU @wgtmac ?

pitrou avatar Feb 20 '24 12:02 pitrou

I just check that there're few common constant used in encoding.cc, so maybe spliting them is not hard.

But I'm not sure about this. Current code is also ok to me

mapleFU avatar Feb 20 '24 13:02 mapleFU

split them into two C++ source files

I'm in favor, but it will be 3 files (or .h/.cc pairs) because there has to be one for the shared funcitonality. One risk of the split is exposing the bits that are now fully private in encoding.cc.

felipecrv avatar Feb 20 '24 14:02 felipecrv

Perhaps we can add a encoding_internal.h file to hold those private but common stuff?

wgtmac avatar Feb 20 '24 14:02 wgtmac

Perhaps we can add a encoding_internal.h file to hold those private but common stuff?

We can, though I doubt there's much shared functionality.

pitrou avatar Feb 20 '24 16:02 pitrou

Hi, I am new to apache arrow and I would like to take this issue. Can this one be assigned to me?

changkhothuychung avatar Mar 09 '24 05:03 changkhothuychung

You can type "take" to take the issue, and create a pull request named "GH-40154: [C++][Parquet] ..." when you finished @changkhothuychung

mapleFU avatar Mar 09 '24 07:03 mapleFU

Assigned!

kou avatar Mar 09 '24 08:03 kou