HIVE-20889: Support timestamp-micros in AvroSerDe
What changes were proposed in this pull request?
Timestamps for avro file format in Hive supports timestamp-millis. This PR brings in the feature where Hive can now support Timestamp-micros logicaltype for avro too. Some details on Logical Types of Avro here: https://gist.github.com/armitage420/fa1432f4dbda56f67bdb08c252415623 More details on Jira description is already present: HIVE-20889
Why are the changes needed?
These changes will increase the reading of timestamps precision for users.
Does this PR introduce any user-facing change?
Yes
How was this patch tested?
Added new qfile and other existing tests
@SourabhBadhya Thank you so much for the review! I will fix the required changes as soon as possible :D
@SourabhBadhya I have made the required changes, I hope you could have a look at it, when you have time! P.P.S- The test failures are unrelated to my changes and everyone recently running the pipeline having the same failures. I will re-trigger for green build, post this issue gets fixed :D
Quality Gate passed
Issues
3 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
@SourabhBadhya The build is green! Please look at the changes when you have the time! Thank you ✨
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.
Hey @SourabhBadhya, I apologise for the late work on the review! I was battling some health issues. I am updating the patch according to your comments. Please have a look whenever you have the time!
@SourabhBadhya The build is green, finally! :^) I hope you can review this when you have the time.
Giving an overview of the changes since our last discussion, as it’s been a while:
-
It was mentioned to add specific columns for timestamp-millis/timestamp-micros => Avro’s logicalType is internal to each datatype. In our case, we are dealing with the timestamp type; hence, the column remains TIMESTAMP. The change is that the same column can now handle higher precision.
-
I had to add separate and specific tests for micros => This is done. I had edited some rows in qfiles to test microsecond precision; instead, I have now added extra rows. For these changes, I didn’t create new qfiles (per point 1), and thus we are testing the precision of the TIMESTAMP column when using the Avro file format.
-
Java code/clean-code related changes are done as mentioned.
-
The suggestion to follow syntax similar to Timestamp#toEpochMilli() — return localDateTime.toInstant(ZoneOffset.UTC).toEpochMilli() * 1000 + localDateTime.getNano() / 1000 — gives incorrect results.
-
The variable AvroSerDe.TIMESTAMP_TYPE_NAME is renamed to AvroSerDe.TIMESTAMP_TYPE_NAME_MILLIS, and we have introduced a new logical type AvroSerDe.TIMESTAMP_TYPE_NAME_MICROS to make the differentiation clear.
-
During testing of some AvroDeserialize scenarios while fixing the above tests, the correct logicalType for all kinds of schemas was not being detected. Hence, I have introduced a schema parser for the same.
Hi @armitage420. Thank you for your efforts. I have been busy, I will try to review it this week.
If this is urgent, feel free to ping other committers as well.
Seems like @SourabhBadhya is busy. @chinnaraolalam Could you please review the PR if/when you have the time?
@check-spelling-bot Report
:red_circle: Please review
See the files view or the action log for details.
Unrecognized words (3)
bucketedtables languagemanual teradatabinaryserde
Previously acknowledged words that are now absent
aarry bytecode cwiki HIVEFETCHOUTPUTSERDE timestamplocal yyyyTo accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands
... in a clone of the [email protected]:armitage420/hive.git repository
on the HIVE-20889 branch:
update_files() {
perl -e '
my @expect_files=qw('".github/actions/spelling/expect.txt"');
@ARGV=@expect_files;
my @stale=qw('"$patch_remove"');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spelling/expect.txt";
use File::Path qw(make_path);
use File::Basename qw(dirname);
make_path (dirname($new_expect_file));
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"$patch_add"');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;
system("git", "add", $new_expect_file);
'
}
comment_json=$(mktemp)
curl -L -s -S \
-H "Content-Type: application/json" \
"https://api.github.com/repos/apache/hive/issues/comments/3293343723" > "$comment_json"
comment_body=$(mktemp)
jq -r ".body // empty" "$comment_json" > $comment_body
rm $comment_json
patch_remove=$(perl -ne 'next unless s{^</summary>(.*)</details>$}{$1}; print' < "$comment_body")
patch_add=$(perl -e '$/=undef; $_=<>; if (m{Unrecognized words[^<]*</summary>\n*```\n*([^<]*)```\n*</details>$}m) { print "$1" } elsif (m{Unrecognized words[^<]*\n\n((?:\w.*\n)+)\n}m) { print "$1" };' < "$comment_body")
update_files
rm $comment_body
git add -u
If the flagged items do not appear to be text
If items relate to a ...
-
well-formed pattern.
If you can write a pattern that would match it, try adding it to the
patterns.txtfile.Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.
Note that patterns can't match multiline strings.
-
binary file.
Please add a file path to the
excludes.txtfile matching the containing file.File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.
^refers to the file's path from the root of the repository, so^README\.md$would exclude README.md (on whichever branch you're using).
Quality Gate passed
Issues
7 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
Hey @SourabhBadhya ! Thanks for your time and review! I have addressed the comments and build is green!
@SourabhBadhya Gentle reminder ✨
@deniskuzZ / @ayushtkn can you please take a look as well?
@deniskuzZ / @ayushtkn Hey! could you please have a look at this PR if/when you have the time
@SourabhBadhya / @deniskuzZ / @ayushtkn could you please have a look and if the PR seems fine could you please merge this if possible? :D
@SourabhBadhya hey could you please check and merge this?
@armitage420 Spell checking seems to fail, can you please check.
@armitage420 Spell checking seems to fail, can you please check.
@SourabhBadhya The spell check is unrelated to my changes:
#### Unrecognized words (3)
bucketedtables
languagemanual
teradatabinaryserde
Cool, I am merging this, since no one has an objection for a very long time.
Thanks for the review and merge!