drill
drill copied to clipboard
DRILL-6820: Msgpack format reader
Implementation of a msgpack format reader
- schema learning
- skip over malformed records
- skip over invalid field names
- skip over records not matching schema
- writing msgpack has not yet been implemented
implementation of a zstandard codec
- only decompression is implemented
@jcmcote could you add a corresponding JIRA as a prefix in the title of the pull request? Refer the format of other pull requests here: https://github.com/apache/drill/pulls
@jcmcote, in HADOOP-13578 was added ZStandard Compression to the hadoop library. I think it would be better to collaborate with existing well-tested implementation instead of introducing the custom one.
@jcmcote, in HADOOP-13578 was added ZStandard Compression to the hadoop library. I think it would be better to collaborate with existing well-tested implementation instead of introducing the custom one.
Agreed. When will drill pickup the new version of Hadoop. Is that a big deal to upgrade the version of Hadoop used?
@jcmcote There is a Jira ticket for Hadoop libs version update: DRILL-6540.
There is an issue related to commons-logging
, see details.
Also there is my "work in progress" branch in the ticket.
@jcmcote, Is it possible to split this pull request into two parts: leave here only changes connected with Msgpack format reader, and continue work on Compression codecs in the scope of a separate Jira after upgrade of Hadoop library is done?
@vvysotskyi Sure I can split them up. Should be easy to do.
Hey @paul-rogers I've made many code review fixes and improvements to the msgpack reader. Could you have another look at it. I would very much like to have it approved and made part of the main code base. Thanks!
@jcmcote taking into account that there is ongoing work to provide schema using file (https://issues.apache.org/jira/browse/DRILL-6835). You might consider waiting for those changes to be published to use common approach of reading and writing schema files.
okay sounds good
On Thu, Jan 10, 2019 at 9:54 AM Arina Ielchiieva [email protected] wrote:
@jcmcote https://github.com/jcmcote taking into account that there is ongoing work to provide schema using file ( https://issues.apache.org/jira/browse/DRILL-6835). You might consider waiting for those changes to be published to use common approach of reading and writing schema files.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/drill/pull/1500#issuecomment-453122964, or mute the thread https://github.com/notifications/unsubscribe-auth/AJoEwoWtRJHSjuYjXhk7st8u65k9vua_ks5vB1QXgaJpZM4XXfMY .
Hi @jcmcote Are you still interested in completing this PR? Recently, the enhanced vector format PRs were committed and could make this better and easier.
If you haven't seen this, here's a link to the tutorial by @paul-rogers https://github.com/paul-rogers/drill/wiki/EVF-Tutorial-Row-Batch-Reader.
Hi @jcmcote Are you still interested in completing this PR?