duckdb icon indicating copy to clipboard operation
duckdb copied to clipboard

[Java] Regression in conversion of Long to Timestamp value via appender

Open jonathanswenson opened this issue 2 years ago • 8 comments

What happens?

In 0.3.3, DuckDBAppender.append(long) could be used to append a millisecond value to be converted to a timestamp in the resulting table.

As of 0.4.0, appending the value fails with an exception:

Not implemented Error: Unimplemented type for cast (INT64 -> TIMESTAMP)
java.sql.SQLException: Not implemented Error: Unimplemented type for cast (INT64 -> TIMESTAMP)
	at org.duckdb.DuckDBNative.duckdb_jdbc_appender_append_long(Native Method)
	at org.duckdb.DuckDBAppender.append(DuckDBAppender.java:47)

To Reproduce

In 0.4.0 the test fails with the above error, but in 0.3.3 it passes successfully.

  TimeZone currentTimeZone = TimeZone.getDefault();
    try {
      TimeZone.setDefault(TimeZone.getTimeZone("UTC"));

      DuckDBConnection connection = getConnection();
      executeOnConnection(connection, "CREATE TABLE ts_test(ts_value timestamp)");

      Instant instant = Instant.now().truncatedTo(ChronoUnit.MILLIS);

      try (DuckDBAppender appender = connection.createAppender("main", "ts_test")) {
        appender.beginRow();
        appender.append(instant.toEpochMilli() * 1000);
        appender.endRow();
      }

      Statement resultStatement = connection.createStatement();
      ResultSet resultSet = resultStatement.executeQuery( "SELECT ts_value from ts_test");

      resultSet.next();

      Timestamp appendedTimestamp = resultSet.getTimestamp(1);
      assertEquals(instant.toEpochMilli(), appendedTimestamp.toInstant().toEpochMilli());

      resultSet.close();
      resultStatement.close();
      connection.close();
    } finally {
      // return timezone back to default.
      TimeZone.setDefault(currentTimeZone);
    }

Environment (please complete the following information):

  • OS: Mac OS X (Apple Silicon)
  • DuckDB Version: 0.4.0
  • DuckDB Client: JDBC

Identity Disclosure:

  • Full Name: Jonathan Swenson
  • Affiliation: Omni

Before Submitting

  • [x] Have you tried this on the latest master branch?
  • [x] Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

jonathanswenson avatar Jun 21 '22 02:06 jonathanswenson

Thanks for the report! This was actually done on purpose (see #3405), as the implicit casting of internal data types made it very easy to accidentally insert bogus data using the Appender. As illustrated by your very comment:

In 0.3.3, DuckDBAppender.append(long) could be used to append a millisecond value to be converted to a timestamp in the resulting table.

Since it actually requires a microsecond value :)

In C++ we now require the explicit construction of a timestamp_t type, using e.g. the Timestamp::FromEpochMicroSeconds method. We should export these date/time/timestamp methods to Java as well for use within the Appender.

Mytherin avatar Jun 21 '22 06:06 Mytherin

Oops yes -- microsecond value.

In C++ we now require the explicit construction of a timestamp_t type, using e.g. the Timestamp::FromEpochMicroSeconds method. We should export these date/time/timestamp methods to Java as well for use within the Appender.

I've been working around some of the limitations of the java appender (appending timestamps from longs, big decimals from strings). Would augmenting the appender to support these types directly -- append(BigDecimal), append(Timestamp) be preferable? Otherwise, it seems like a trycast implementation from int64 to timetamp_t using Timestamp::FromEpochMicroSeconds was sufficient to "fix" this.

jonathanswenson avatar Jun 21 '22 17:06 jonathanswenson

Augmenting to support these types would definitely be preferable.

Interestingly #3841 is kind of the opposite issue of this one. We used to always require appending the exact internal types for the Appender - which is the behavior that you are observing there with the decimal types.

#3405 changes this to do proper casting, which makes the behavior of the appender a lot more clear, except for decimal types (which were left out for now because there were some problems in other parts of the code that relied on this where fixing was non-trivial).

Mytherin avatar Jun 21 '22 18:06 Mytherin

I can't say it surprises me that decimal and timestamp types are ... weird.

One thing I did like about the long -> timestamp method of appending a value is that I didn't have to create additional objects here in order to load a stream of data in. In order to differentiate and having an append(Timestamp) method one will have to create / allocate a short lived object (either a java.sql.Timestamp or DuckDBTimestamp) if you have the long values already.

Would it make sense to use DuckDBTimestamp for this purpose? or is that class more of an implementation detail?

Given that the system does support these casts in some shape (strings seem to work for most cases) it would probably make sense that you could do either in the long run.

jonathanswenson avatar Jun 21 '22 22:06 jonathanswenson

method one will have to create / allocate a short lived object (either a java.sql.Timestamp or DuckDBTimestamp) if you have the long values already.

Would the JVM not optimize this out in a loop? At least in C++ there is zero cost to wrapping a scalar in a struct.

I think having a separate class would be the cleanest from a code perspective, as it would be very clear what the code is doing. We could also add e.g. an appendTimestampMicros method (or similar) that would be more explicit in what it is appending and do the actual conversion in C++. We could also add an appendInternal method that would append using the internal type instead.

I would prefer not to actually add a numeric -> timestamp cast to the system, since in my opinion a query like this should fail: SELECT 1::TIMESTAMP

Mytherin avatar Jun 21 '22 22:06 Mytherin

Would the JVM not optimize this out in a loop?

Good point. Hopefully 🙂

I like the appendTimestampMicros and appendTimestamp that both call a private appendTimestampInternal that could create and append the appropriate timestamp_t value.

think having a separate class would be the cleanest from a code perspective

You'd rather a distinct, purpose specific class for each of these? DuckDBAppenderTimestamp / Date / Time or something to that tune? The time classes are a bit of a nightmare, but I could definitely see the argument for using Instant or java.sql.Timestamp (especially because this IS kind of a jdbc driver).

I would prefer not to actually add a numeric -> timestamp cast to the system

Initially I was looking into doing the cast based implementation (with an implementation of TryCast::Operation(int64_t input, timestamp_t &result, bool strict) Just so happens that the SELECT 1::TIMESTAMP or SELECT CAST(CAST(1 AS LONG) AS TIMESTAMP) still fails due to the difference between the int64_t and bigint / integer.

But I see your point.

jonathanswenson avatar Jun 21 '22 23:06 jonathanswenson

I like the appendTimestampMicros and appendTimestamp that both call a private appendTimestampInternal that could create and append the appropriate timestamp_t value.

That sounds good to me.

You'd rather a distinct, purpose specific class for each of these? DuckDBAppenderTimestamp / Date / Time or something to that tune? The time classes are a bit of a nightmare, but I could definitely see the argument for using Instant or java.sql.Timestamp (especially because this IS kind of a jdbc driver).

It would not need to be restricted only to the Appender, but indeed these classes might not be used for anything else in the JDBC. Accepting Java classes seems like a good idea regardless.

Mytherin avatar Jun 22 '22 06:06 Mytherin

Indeed we should just support appending a java.sql.Timestamp

hannes avatar Aug 29 '22 08:08 hannes

Any workaround for inserting timestamps through the appender? 0.6.1 is returning the following error when trying to insert the time as microseconds double

Not implemented Error: Unimplemented type for cast (INT64 -> INT64)

firecast avatar Dec 27 '22 06:12 firecast

In 0.8, I am not seeing an appendTimestamp option?

ryanhamilton avatar Jun 21 '23 09:06 ryanhamilton

Any plans to support this?

pronzato avatar Jun 22 '23 12:06 pronzato

It would help if there was a way to query the column types from an appender: https://github.com/duckdb/duckdb/discussions/8069

karlseguin avatar Jun 24 '23 03:06 karlseguin

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] avatar Sep 23 '23 00:09 github-actions[bot]

This issue was closed because it has been stale for 30 days with no activity.

github-actions[bot] avatar Oct 24 '23 00:10 github-actions[bot]

I'm running the 0.9.2 version and I get

Not implemented Error: Unimplemented type for cast (INT64 -> INT64) at org.duckdb.DuckDBNative.duckdb_jdbc_appender_append_long(Native Method)

This is the Java value I'm trying to append:

duckDBAppender.append(new DuckDBTimestamp(System.currentTimeMillis() * 1000).getMicrosEpoch());

andrea-tomassi avatar Nov 18 '23 18:11 andrea-tomassi

A fix that doesn't expose error prone raw long appenders, but achieves the same goal via DuckDBTimestamp -> https://github.com/duckdb/duckdb/pull/10736 @jonathanswenson

Indeed we should just support appending a java.sql.Timestamp

@hannes I agree that shouldn't be support for appending a long directly on a TIMESTAMP column, but java.sql.Timestamp is a problematic type as it normalizes the internal data to the system's timezone implicitly. I opened a PR (#10736) adding DuckDBTimestamp support and improving the java.time.LocalDateTime support.

felipecrv avatar Feb 18 '24 19:02 felipecrv

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] avatar Jul 01 '24 12:07 github-actions[bot]