arrow
arrow copied to clipboard
[C++][Parquet] Separate encoders and decoder
Describe the enhancement requested
Currently, encoders and decoders are defined in a single file encoding.cc, which is quite large.
Given that their infrastructure is separate, it would probably make maintenance easier to split them into two C++ source files (for example encoder.cc and decoder.cc). We can add corresponding .h files, and also keep encoding.h for compatibility.
Component(s)
C++, Parquet
Thoughts @felipecrv @mapleFU @wgtmac ?
I just check that there're few common constant used in encoding.cc, so maybe spliting them is not hard.
But I'm not sure about this. Current code is also ok to me
split them into two C++ source files
I'm in favor, but it will be 3 files (or .h/.cc pairs) because there has to be one for the shared funcitonality. One risk of the split is exposing the bits that are now fully private in encoding.cc.
Perhaps we can add a encoding_internal.h file to hold those private but common stuff?
Perhaps we can add a
encoding_internal.hfile to hold those private but common stuff?
We can, though I doubt there's much shared functionality.
Hi, I am new to apache arrow and I would like to take this issue. Can this one be assigned to me?
You can type "take" to take the issue, and create a pull request named "GH-40154: [C++][Parquet] ..." when you finished @changkhothuychung
Assigned!