questdb icon indicating copy to clipboard operation
questdb copied to clipboard

coalesce returns null on result from division by zero

Open superichmann opened this issue 2 years ago • 10 comments

TLDR Update: in aggregate functions (AVG / SUM / MAX / etc.), QDB internally handles -Infinity and Infinity values as nulls and do not calculate them into the final result. Column with 1000 Infinity values and two values of 1,3 will have the AVG of 2. which is wrong.

I see now behavior is changed and its actually returning null when dividing by zero. which can be treaded with a simple coalesce.

Describe the bug

coalesce should convert the nulls to something else.. in this case 0 but the result is all nulls.

I know some of my tickers may seem weird but this is a real scenario that actually happened to me on my server, I have just reproduced it on demo for your convenience.

To reproduce

try on demo questdb SELECT *,A/G,coalesce(A/G,0) FROM (SELECT galon_price A,coalesce((galon_price1),0) G FROM (SELECT * FROM gas_prices b lt join (SELECT * FROM gas_prices a) on timestamp))

Expected Behavior

result should be 0

Environment

- **QuestDB version**: demo
- **OS**: ?
- **Browser**: chrome

Additional context

No response

superichmann avatar May 06 '23 12:05 superichmann

thanks @superichmann for reporting the issue and for the reproducer.
I used your reproducer to create a simpler one:

select coalesce(1/n, 0) from (
  select 0::double as n from long_sequence(1)
);

I reckon it's related to the strategy to find the common type between double and long.

jerrinot avatar May 09 '23 11:05 jerrinot

ok, it's not due to typecasting. I assume you are using the REST interface. Is that so?

Here is what is going on:

  • Floating point division by 0 results in Infinity as defined by IEEE 754.
  • Infinity is not null hence the coalesce() function also returns Infinity.
  • QuestDB REST interface uses JSON to serialize query results, but JSON as defined by ECMA-404 does not know Infinity. This is what the JSON spec says:

Numeric values that cannot be represented as sequences of digits (such as Infinity and NaN) are not permitted

Hence QuestDB converts all Infinity values to null when serializing query results to JSON.

This is not ideal, but I am afraid the alternative is to use strings to represent double/float values, which is not great either.

As a workaround you can cast the result of the coalesce() function to string in your query:

SELECT *,A/G,coalesce(A/G,0)::string 
FROM (
  SELECT galon_price A,coalesce((galon_price1),0) G 
  FROM (SELECT * FROM gas_prices b lt join (SELECT * FROM gas_prices a) on timestamp)
);

Would this work for you?

jerrinot avatar May 09 '23 13:05 jerrinot

I am using webui for my tests so same as demo.questdb.com. I have noticed your fix suggestion but the problem starts when I need to compare and convert all kinds of results to Infinity and -Infinity (some results are null as well but actually are -Infinity) so my SELECTs are too complex, such as: SELECT coalesce(nullif(((COLUMN_1))::string,'Infinity')::double,99999) only to get one double value.. and as well I need to handle -Infinity

I have noticed that a conversion from double to INT gets back INT.MAX(2147483647), so why not do the same for double? see demo example: SELECT *,(A/G)::char,(A/G)::int,(A/G)::string,coalesce(A/G,0),A/G FROM (SELECT galon_price A,coalesce((galon_price1),0) G FROM (SELECT * FROM gas_prices b lt join (SELECT * FROM gas_prices a) on timestamp))

superichmann avatar May 09 '23 18:05 superichmann

hello @superichmann, I'm not sure I understand your last message. You can work with double (and thus Infinity) in all layers of your query. You only cast from double to string at the outermost projection - right before QuestDB converts it to JSON - as that's the only part where Infinity is converted to null as I explained above.

You wrote:

I have noticed that a conversion from double to INT gets back INT.MAX(2147483647), so why not do the same for double?

I am not sure if I understand it well. Are you proposing the REST interface should not serialize Infinity as null, but as some kind of sentinel value like Integer.MAX_VALUE or Long.MAX_VALUE? Which one would you choose and why?

fwiw, in JavaScript JSON.stringify(Infinity) also returns null. I take it as an indication the current JSON serialization behaviour is not completely baseless:

  • questdb serializes Infinity to null by default
  • as a user you can use explicit casting (to string, long, up to you) if you need a different behaviour.

jerrinot avatar May 10 '23 08:05 jerrinot

to be honest I have no idea I just showing something I saw on WEBUI. As far as I see it if a value is not null then it should not be presented to the user as null.

You have much more understanding then me about it so feel free to handle this issue as you see :]

superichmann avatar May 10 '23 08:05 superichmann

well, that's the limitation of the JSON format. It does not allow numeric values which cannot be represented as sequences of digits. so there is no good option here - it's all about trade-offs.

I'm closing this ticket for now, we might revisit this in the future. if anyone feels strongly about this then feel free to add +1 or re-open the ticket.

jerrinot avatar May 10 '23 09:05 jerrinot

Hi, Maybe I am still missing something but the main thing I am trying to say is that functions like AVG or SUM ignore nulls - ignoring missing data I think that the internal mechanisms in QDB treat these Infinity values as null values and not calculating them, instead of calculating them as infinity or as a very high number.

Could you check this on your side? is this the expected behavior? the end result is that queries return unrealistic values according to the calculation.

I might as well not be using the correct aggregate function, please correct me if I am wrong (ksum nsum sum) maybe I need to use one of these.

superichmann avatar Jun 13 '23 19:06 superichmann

I mean, you really don't think its weird?

try this query on https://demo.questdb.io/ SELECT extra, extra/0,CASE WHEN extra/0 > 1 THEN 99999 ELSE -111 END from trips

Can you please instruct me on how can I handle this situation on my queries? I run calculations on 3 million rows, each addition to the command such as coalesce or infinity or null or 0 checks is adding much time to the queries :[

How can I check for these Infinity -Infinity values in query?

superichmann avatar Jun 13 '23 20:06 superichmann

there is as well different behavior for number sent through query and numbers retrieved from columns. when sending numbers division by zero is always null but when data is retrieved from qdb then sometimes its Infinity and sometimes null - if 0/0 then null, if 9/0 then infinity.

This is from my check..

superichmann avatar Jun 13 '23 20:06 superichmann

now if COUNT(*) would have worked in all calculated queries and not only in top level query I could have circumvent this count only non nulls behavior.. SELECT COUNT(*)/AVG(dropoff_longitude)+COUNT(*)*SUM(dropoff_longitude) from trips

superichmann avatar Jun 13 '23 20:06 superichmann