nhibernate-core icon indicating copy to clipboard operation
nhibernate-core copied to clipboard

Incorrect result when using GroupBy with First

Open keithyipkw opened this issue 5 years ago • 3 comments

The following code returns duplicates of a group instead of a list of the first item of each group.

.GroupBy(a => a.X)
.Select(g => g.OrderBy(y => y.Id).First())

The more puzzling part is that the method syntax ("actual") gives a correct return in my production code but the SQL syntax (actual2) gives a wrong result. I cannot reproduce the same behavior outside my production code.

Version: NHibernate 5.2.5

Output: Expected output

1 2
1 2
1 2

Actual output

1 2
1 1
1 1

Code:

static void Main(string[] args)
        {
            var factory = Fluently.Configure()
                .Diagnostics(x => x.OutputToConsole())
                .Database(SQLiteConfiguration.Standard.ConnectionString("FullUri=file:a.sqlite3;journal mode=Wal"))
                .Mappings(m => m.AutoMappings.Add(
                    AutoMap.Source(new EnumeratedTypeSource("Test", new[] {
                        typeof(A),
                    }))
                    .Conventions.Add(
                        DefaultLazy.Never())))
                .ExposeConfiguration(c =>
                {
                    new SchemaExport(c).Create(false, true);
                })
                .BuildSessionFactory();
            using (var session = factory.OpenSession())
            using (var transaction = session.BeginTransaction())
            {
                for (int i = 0; i < 10; ++i)
                {
                    session.Save(new A()
                    {
                        X = i % 2,
                    });
                }
                transaction.Commit();
            }
            using (var session = factory.OpenSession())
            using (var transaction = session.BeginTransaction())
            {
                //=====================================
                var expected = session.Query<A>()
                    .ToList() // <-- copy to memory
                    .GroupBy(a => a.X)
                    .Select(g => g.OrderBy(y => y.Id).First())
                    .ToList()
                    .Select(a => a.Id);
                Console.WriteLine(string.Join(" ", expected));
                //=====================================
                var actual = session.Query<A>()
                    .GroupBy(a => a.X)
                    .Select(g => g.OrderBy(y => y.Id).First())
                    .ToList()
                    .Select(a => a.Id);
                Console.WriteLine(string.Join(" ", actual));
                //=====================================
                var actual2 = (from a in session.Query<A>()
                               group a by a.X into grp
                               select grp.OrderBy(y => y.Id).First())
                              .ToList()
                              .Select(a => a.Id);
                Console.WriteLine(string.Join(" ", actual2));
                //=====================================
            }
            Console.ReadLine();
        }
        
        public class A
        {
            public int Id { get; set; }
            public int X { get; set; }
            public int Y { get; set; }
        }

        class EnumeratedTypeSource : ITypeSource
        {
            public EnumeratedTypeSource(string id, IEnumerable<Type> types)
            {
                this.types = types.ToArray();
                this.id = id;
            }

            private readonly Type[] types;
            private readonly string id;

            public string GetIdentifier()
            {
                return id;
            }

            public IEnumerable<Type> GetTypes()
            {
                return types;
            }

            public void LogSource(IDiagnosticLogger logger)
            {
                if (logger == null) throw new ArgumentNullException("logger");

                logger.LoadedFluentMappingsFromSource(this);
            }
        }

keithyipkw avatar Jun 02 '19 13:06 keithyipkw

Same issue found. Is there a workaround, because otherwise I have to load thousands of records in memory and do the group by and order by (descending) all in memory?

gmathijssen avatar Mar 29 '24 15:03 gmathijssen

Same issue found. Is there a workaround, because otherwise I have to load thousands of records in memory and do the group by and order by (descending) all in memory?

Have you profiled your program to check if working in the memory is an actual problem? If yes, rewriting your query in terms of other operations may work. Remember to profile your alternative too.

keithyipkw avatar Apr 02 '24 13:04 keithyipkw

Exposing the subitems of groupings is something that translates badly to SQL, and therefore it's rarely (if ever) supported by OR mappers. Performing it in one query requires using window functions, such as ROW_NUMBER() OVER(PARTITION BY X ORDER BY Y), combined with derived tables or CTE:s.

However, the query in the OP just returns the Id of the first sub item per group, ordered by Id.

 var actual = session.Query<A>()
                    .GroupBy(a => a.X)
                    .Select(g => g.OrderBy(y => y.Id).First())
                    .ToList()
                    .Select(a => a.Id);

which could by expressed as

 var actual = session.Query<A>()
                    .GroupBy(a => a.X)
                    .Select(g => g.Min(y => y.Id))
                    .ToList()

If you want the actual items, you could query for the id:s first, and then use that in a second query.

gliljas avatar Apr 02 '24 22:04 gliljas