birt icon indicating copy to clipboard operation
birt copied to clipboard

NullPointerException using COUNT aggregate

Open hvbtup opened this issue 3 years ago • 13 comments

With some reports we are facing a NullPointerException.

The relevant part of the stack trace is this: ... Caused by: java.lang.NullPointerException at org.eclipse.birt.data.engine.executor.aggregation.ProgressiveAggregationHelper$JSColumnBindingRow.get(ProgressiveAggregationHelper.java:508) ...

I can narrow it down to the following binding: grafik

I don't know if that's relevant, but actually the "AnzahlKarten" binding is not at all used inside the list item. So I could fix my report (in this particular case) by just removing the binding "AnzahlKarten".

It seems that the problem is related to the fact that the expression to count is not a column from the dataset row. The issue arises when the expression is a constant expression or row.__rownum.

A general workaround is to count any not-null column instead of row.__rownum or a constant expression.

The very same report with the very same data works with BIRT 4.3.0.

As a side note: The COUNT aggregate is somewhat weird anyway. I would expect that it should be possible to use COUNT and COUNTDISTINCT also without an expression. COUNT should then be the number of rows and COUNTDISTINCT should be the number of distinct rows (considering all columns from the database). That would be in line with the corresponding SQL keywords COUNT and COUNT (DISTINCT ...).

hvbtup avatar Mar 22 '22 10:03 hvbtup

@hvbtup , Can you provide a small test report with the example database which recreates this problem?

claesrosell avatar Mar 22 '22 22:03 claesrosell

Voila! issue-885.zip

hvbtup avatar Mar 23 '22 08:03 hvbtup

I created a modified version of the report.

issue-885.zip Here, I created two tables with a filter condition based on PRODUCTLINE. The first table filters PRODUCTLINE LIKE "T%", the second one PRODUCTLINE LIKE "X%". Otherwise the tables are identical: The aggregate "CountRows" uses COUNT to count row.__rownum. The visibility expression hides the table if CountRows == 0. The two tables are identical except for the filter expression.

With BIRT 4.2.1, the output shows only the "T" table, with a correct value of 2 for CountRows (Trucks and Trains&Busses). With BIRT 4.9.0, the report fails, With the Designer from #914, the output shows both tables, with a value of 1 for CountRows for the X table.

I think it is somewhat debatable what the correct result should be. One might say that in order to "store" the CountRows result somewhere, a row is needed, so the result 1 would be correct by that argument.

But the 4.2.1 result is less surprising IMHO.

We had similar issues if the expression to COUNT was a constant (e.g. 1) (don't remember with which BIRT release).

hvbtup avatar Apr 04 '22 12:04 hvbtup

I will take another look at this. The row index of the DataSet is probably 0-based and I guess that row.__rownum should be 1-based. The part of the Birt which I took inspiration from when "fixing" this just returned the index, but that index was probably already 1-based.

claesrosell avatar Apr 04 '22 15:04 claesrosell

row.__rownum has always been 0-based and that's OK. The value of row.__rownum was and still is correct. A value of -1 indicates that their are no rows at all. 0 is the first row and so on.

COUNT(x) should (if I understand correctly) count the not-null - or not-empty?(*) - values of x.

(*) For me, in the Oracle world, there is no difference between an empty string and null, so I don't really care.

Insofar row.__rownum is a special case, because for normal columns, the value is probably initialized with null, because "before the first row" row.__rownum already has the not-null value -1.

My intention for using COUNT(row.__rownum) was to check if there is any data or not, using it in the visibility expression to hide the whole list/table when there is no data.

I could have done this with with checking for row.__rownum < 0 in the visibilty expression, of course. Nevertheless, this has worked with older BIRT versions and I think it is a valid use-case.

I've found this mentioned in https://www.eclipse.org/forums/index.php/t/848630/ - I even found a comment from me there, reminding me that I reported this as a bug in 2014: https://bugs.eclipse.org/bugs/show_bug.cgi?id=438257

Obviously, the behavior changed between 4.2.1 and 4.3.0.

Looking at the code in TotalCount.java (method onRow in private class MyAcculumator) it seems like just leaving the expression completely empty should work (if countByColumn is false, this would work like COUNT(*) in SQL then) - I didn't test it in debug mode though. But as Tom G said in 848630, that didn't work.

While using row.__rownum < 0 in the visibiliy expression to hide the table/list when there is no data does work, it is a non-obvious workaround for an essential feature: To hide a table when there is no data is needed in most reports.

I wonder if it would make sense to explicitly allow * as an expression for COUNT and COUNTDISTINCT (the latter is much more complicated because it requires to manage a map of the distinct rows so far). But that's only eye-candy. Supporting a correct value of COUNT (and COUNTDISTINCT) for an empty expression should work.

One idea: Would it be possible to supply a special argument EMPTY_EXPRESSION as args[0] to onRow if the expression is empty? We could then deal with that in onRow. I believe the if-statement for setting countByColumn to false in onRow is not working because when the expression is empty, it is still supplied null in args[0].

hvbtup avatar Apr 05 '22 07:04 hvbtup

When I have read up on all the forum posts and bug descriptions / comments it seems like the original problem does not have anything to do with an NPE while referencing row.__rownum in an aggregation. The original problem is that COUNT() return 1 when it should return 0;

The NPE is a problem encountered when you try to "work around" the original problem. Is that correct?

claesrosell avatar Apr 05 '22 11:04 claesrosell

Yes, that's correct. Nevertheless the report should not fail. So even if #914 does not fix the original problem, I'm +1 for commiting. It is not perfect, but better than the current status IMHO. The only case where it is not working correctly is when there are no rows at all, which is documented in the old discussion and in my old bug entry. Since using row.__rownum as expression is supported by the dialog (shown as "RowNum"), it is legal to use it. I propose to commit #914, close this issue, and create a new issue for the wrong result of COUNT.

hvbtup avatar Apr 05 '22 13:04 hvbtup

I think that I have found the problem for COUNT() returning 0 for empty resultsets. For some strange reason, the build is not working though. I think that the cause for the build-fail is unrelated to this change ( it is complaining while building javadoc via ant ). I will look into it later today. In the meantime, if you want to, you can check the new diff of SimpleResultSet in the PR.

claesrosell avatar Apr 05 '22 13:04 claesrosell

This commit went into 4.2.2 : https://github.com/eclipse/birt/commit/d4839e4ff8d944f2bd194d15ca6c67f8a70885ff (SimpleResultSet.java and the introduction of prepareFirstRow() ) I am pretty sure that things stopped working then. I may want to tweak the PR a bit more, but I think that this is the cause of the problem: The aggregator is called even if the result set is empty.

claesrosell avatar Apr 06 '22 07:04 claesrosell

I think that I have found the problem for COUNT() returning 0 for empty resultsets.

Just to be clear: The value 0 correct in this case of course. The value 1 is wrong.

I am pretty sure that things stopped working then.

This matches my observations. We skipped 4.2.2 - we were using 4.2.1 before and then used 4.3.0 when we noticed the original issue ("wrong value for COUNT"). Looking at the changes, I really don't understand what's going on there and I also don't find the number 45785 related to BIRT in the old forums or bugzilla (probably this an OpenText internal issue id), but the commit seems to be related.

hvbtup avatar Apr 06 '22 07:04 hvbtup

I think that I have found the problem for COUNT() returning 0 for empty resultsets.

Just to be clear: The value 0 correct in this case of course. The value 1 is wrong.

Of course, typo by me.

I hope to fix this tonight. We need to get the build working first though. See #917

claesrosell avatar Apr 06 '22 07:04 claesrosell

| See #917

I don't know what's going on there. BTW I tried to compile #914 myelf but at the moment I cannot run the All-In-One from inside the SDK IDE - seems to be some conflict between javax.xml and jakarta.xml, but I don't really know. So I tested it with the binary artifacts from GH instead.

hvbtup avatar Apr 06 '22 07:04 hvbtup

I tested issue-885.rptdesign again with the artifact from #923 and it is OK with this code (I assume that #923 includes the fixes from #914)?

hvbtup avatar May 27 '22 06:05 hvbtup