[FEA] Remove usages of `pyorc` where not necessary
Is your feature request related to a problem? Please describe.
Writing a pandas dataframe to an orc file using pyorc is a bit of a complex operation. Until now we have been using pyorc as a reference writer because we had no other choice, with the introduction of pyarrow's orc writer we should be making a switch from pyorc and this should remove a lot of complex handling that needs to be done for nested dtypes.
Describe the solution you'd like
Drop pyorc usages to almost none - Though we will keep it probably for a few basic dtype tests to validate compatibilty. But fuzz-testing and the rest of pytests should make the switch.
Describe alternatives you've considered
The FEA itself is a better alternative to pyorc 😉
Additional context https://arrow.apache.org/docs/python/generated/pyarrow.orc.write_table.html
cc: @GregoryKimball @vuule
Is the main difference in the PyORC's requirement to pass in a schema? Would it be possible to try this out in fuzz tests to verify that pyarrow is robust?
Is the main difference in the PyORC's requirement to pass in a schema?
That + while the writer was an internal only API it lacked the stripe_size support which was a limiting factor to use that internal version of pyarrow's orc writer.
Would it be possible to try this out in fuzz tests to verify that pyarrow is robust?
Yup
I definitely like the suggestion, pyarrow API looks very clean and... comprehensive (more so than ours 😬).
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
https://github.com/rapidsai/cudf/pull/12103 solves part of the problem. However we will need to wait until pyarrow can write complex nested data-types to an orc file.
This was completed in #14323