orientdb icon indicating copy to clipboard operation
orientdb copied to clipboard

count(distinct(<field>)) does not work

Open mmacfadden opened this issue 5 years ago • 2 comments

OrientDB Version: 3.0.26

Expected behavior

count(distinct()) should count the number of distinct element.

Actual behavior

It seems to count the total elements

Steps to reproduce

  1. Create the class and properties:
CREATE CLASS Test;
CREATE Property Test.id String;
CREATE Property Test.category String;
  1. Add some data
INSERT INTO Test SET id = "a", category = "1";
INSERT INTO Test SET id = "b", category = "1";
INSERT INTO Test SET id = "c", category = "2";
INSERT INTO Test SET id = "d", category = "1";
INSERT INTO Test SET id = "e", category = "3";
INSERT INTO Test SET id = "f", category = "1";
INSERT INTO Test SET id = "g", category = "2";
INSERT INTO Test SET id = "h", category = "2";
INSERT INTO Test SET id = "i", category = "2";
INSERT INTO Test SET id = "j", category = "1";
INSERT INTO Test SET id = "k", category = "1";
  1. Test

Test 1: Count (works)

SELECT count(category) as count FROM Test;

Correct Returns:

    "result": [
        {"@type": "d", "@version": 0, "count": 11, "@fieldTypes": "count=l"}
    ]

Test 2: Distinct (works)

SELECT distinct(category) FROM Test;

Correct Returns:

    "result": [
        { "@type": "d", "@version": 0, "category": "1"},
        { "@type": "d", "@version": 0, "category": "2"},
        { "@type": "d", "@version": 0, "category": "3"}
    ]

Test 3: Count + Distinct (does not work)

SELECT count(distinct(category)) as count FROM Test;

Correct Returns:

    "result": [
        {"@type": "d", "@version": 0, "count": 11,  "@fieldTypes": "count=l"}
    ]

I would have expected the count to be 3, not 11.

A work around is as follows:

SELECT count() as count FROM (SELECT distinct(category) FROM Test);
    "result": [
        {"@type": "d", "@version": 0, "count": 3,  "@fieldTypes": "count=l"}
    ]

mmacfadden avatar Jan 20 '20 15:01 mmacfadden

Hi @mmacfadden

you are right, as of now the count(distinct()) is not properly supported and needs some deep refactoring.

We actually deprecated distinct() function in v 3.0, ie. now you can do SELECT DISTINCT foo, bar, so I'd suggest to use the following as a work-around for now

SELECT count(*) as count FROM (
  SELECT DISTINCT category FROM Test
);

Thanks

Luigi

luigidellaquila avatar Jan 22 '20 10:01 luigidellaquila

@luigidellaquila Thanks. I will take that approach. I checked the 3.0 documentation, I didn't see a deprecation notice for distinct(). Do we need to update that?

mmacfadden avatar Jan 22 '20 13:01 mmacfadden