hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-20889: Support timestamp-micros in AvroSerDe

Open armitage420 opened this issue 10 months ago • 4 comments

What changes were proposed in this pull request?

Timestamps for avro file format in Hive supports timestamp-millis. This PR brings in the feature where Hive can now support Timestamp-micros logicaltype for avro too. Some details on Logical Types of Avro here: https://gist.github.com/armitage420/fa1432f4dbda56f67bdb08c252415623 More details on Jira description is already present: HIVE-20889

Why are the changes needed?

These changes will increase the reading of timestamps precision for users.

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Added new qfile and other existing tests

armitage420 avatar Apr 21 '25 16:04 armitage420

@SourabhBadhya Thank you so much for the review! I will fix the required changes as soon as possible :D

armitage420 avatar May 05 '25 15:05 armitage420

@SourabhBadhya I have made the required changes, I hope you could have a look at it, when you have time! P.P.S- The test failures are unrelated to my changes and everyone recently running the pipeline having the same failures. I will re-trigger for green build, post this issue gets fixed :D

armitage420 avatar May 12 '25 16:05 armitage420

@SourabhBadhya The build is green! Please look at the changes when you have the time! Thank you ✨

armitage420 avatar May 17 '25 17:05 armitage420

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.

github-actions[bot] avatar Jul 18 '25 00:07 github-actions[bot]

Hey @SourabhBadhya, I apologise for the late work on the review! I was battling some health issues. I am updating the patch according to your comments. Please have a look whenever you have the time!

armitage420 avatar Jul 24 '25 15:07 armitage420

@SourabhBadhya The build is green, finally! :^) I hope you can review this when you have the time.

Giving an overview of the changes since our last discussion, as it’s been a while:

  1. It was mentioned to add specific columns for timestamp-millis/timestamp-micros => Avro’s logicalType is internal to each datatype. In our case, we are dealing with the timestamp type; hence, the column remains TIMESTAMP. The change is that the same column can now handle higher precision.

  2. I had to add separate and specific tests for micros => This is done. I had edited some rows in qfiles to test microsecond precision; instead, I have now added extra rows. For these changes, I didn’t create new qfiles (per point 1), and thus we are testing the precision of the TIMESTAMP column when using the Avro file format.

  3. Java code/clean-code related changes are done as mentioned.

  4. The suggestion to follow syntax similar to Timestamp#toEpochMilli() — return localDateTime.toInstant(ZoneOffset.UTC).toEpochMilli() * 1000 + localDateTime.getNano() / 1000 — gives incorrect results.

  5. The variable AvroSerDe.TIMESTAMP_TYPE_NAME is renamed to AvroSerDe.TIMESTAMP_TYPE_NAME_MILLIS, and we have introduced a new logical type AvroSerDe.TIMESTAMP_TYPE_NAME_MICROS to make the differentiation clear.

  6. During testing of some AvroDeserialize scenarios while fixing the above tests, the correct logicalType for all kinds of schemas was not being detected. Hence, I have introduced a schema parser for the same.

armitage420 avatar Aug 25 '25 15:08 armitage420

Hi @armitage420. Thank you for your efforts. I have been busy, I will try to review it this week.

If this is urgent, feel free to ping other committers as well.

SourabhBadhya avatar Aug 25 '25 16:08 SourabhBadhya

Seems like @SourabhBadhya is busy. @chinnaraolalam Could you please review the PR if/when you have the time?

armitage420 avatar Sep 12 '25 04:09 armitage420

@check-spelling-bot Report

:red_circle: Please review

See the files view or the action log for details.

Unrecognized words (3)

bucketedtables languagemanual teradatabinaryserde

Previously acknowledged words that are now absent aarry bytecode cwiki HIVEFETCHOUTPUTSERDE timestamplocal yyyy
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands

... in a clone of the [email protected]:armitage420/hive.git repository on the HIVE-20889 branch:

update_files() {
perl -e '
my @expect_files=qw('".github/actions/spelling/expect.txt"');
@ARGV=@expect_files;
my @stale=qw('"$patch_remove"');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spelling/expect.txt";
use File::Path qw(make_path);
use File::Basename qw(dirname);
make_path (dirname($new_expect_file));
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"$patch_add"');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;
system("git", "add", $new_expect_file);
'
}

comment_json=$(mktemp)
curl -L -s -S \
-H "Content-Type: application/json" \
"https://api.github.com/repos/apache/hive/issues/comments/3293343723" > "$comment_json"
comment_body=$(mktemp)
jq -r ".body // empty" "$comment_json" > $comment_body
rm $comment_json

patch_remove=$(perl -ne 'next unless s{^</summary>(.*)</details>$}{$1}; print' < "$comment_body")

patch_add=$(perl -e '$/=undef; $_=<>; if (m{Unrecognized words[^<]*</summary>\n*```\n*([^<]*)```\n*</details>$}m) { print "$1" } elsif (m{Unrecognized words[^<]*\n\n((?:\w.*\n)+)\n}m) { print "$1" };' < "$comment_body")

update_files
rm $comment_body
git add -u
If the flagged items do not appear to be text

If items relate to a ...

  • well-formed pattern.

    If you can write a pattern that would match it, try adding it to the patterns.txt file.

    Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

    Note that patterns can't match multiline strings.

  • binary file.

    Please add a file path to the excludes.txt file matching the containing file.

    File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

    ^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

github-actions[bot] avatar Sep 15 '25 18:09 github-actions[bot]

Hey @SourabhBadhya ! Thanks for your time and review! I have addressed the comments and build is green!

armitage420 avatar Sep 16 '25 03:09 armitage420

@SourabhBadhya Gentle reminder ✨

armitage420 avatar Sep 23 '25 16:09 armitage420

@deniskuzZ / @ayushtkn can you please take a look as well?

SourabhBadhya avatar Sep 23 '25 17:09 SourabhBadhya

@deniskuzZ / @ayushtkn Hey! could you please have a look at this PR if/when you have the time

armitage420 avatar Sep 28 '25 16:09 armitage420

@SourabhBadhya / @deniskuzZ / @ayushtkn could you please have a look and if the PR seems fine could you please merge this if possible? :D

armitage420 avatar Oct 13 '25 04:10 armitage420

@SourabhBadhya hey could you please check and merge this?

armitage420 avatar Nov 24 '25 12:11 armitage420

@armitage420 Spell checking seems to fail, can you please check.

SourabhBadhya avatar Nov 24 '25 16:11 SourabhBadhya

@armitage420 Spell checking seems to fail, can you please check.

@SourabhBadhya The spell check is unrelated to my changes:

#### Unrecognized words (3)

bucketedtables
languagemanual
teradatabinaryserde
image

armitage420 avatar Nov 24 '25 19:11 armitage420

Cool, I am merging this, since no one has an objection for a very long time.

SourabhBadhya avatar Nov 25 '25 09:11 SourabhBadhya

Thanks for the review and merge!

armitage420 avatar Nov 25 '25 14:11 armitage420