LiteDB icon indicating copy to clipboard operation
LiteDB copied to clipboard

If a table has large number of records, running a count query takes minutes to execute

Open SuheylZ opened this issue 5 years ago • 12 comments

I have the following item stored in LiteDB

internal class DBItem
    {
        [BsonId]
        public long Id { get; set; }
        public BigInteger Value { get; set; }
        public DateTime CreatedOn { get; set; } = DateTime.UtcNow;
        public bool IsUsed { get; set; } = false;
    }

now when I execute the following line it never returns the number while the memory usage keeps shooting at around 4.7 GB.

var unusedCount = collection.Count(x=>!x.IsUsed);

please note that I have more than 10 billion records.

SuheylZ avatar Apr 30 '19 09:04 SuheylZ

Is there a index on IsUsed? I have a data file larger than 25GB and single collection larger than 7GB, it works well.But I did not execute Count operation before.

lidanger avatar May 01 '19 03:05 lidanger

well, try count() with an expression. it would almost bring the execution to halt while holding gigs of memory. i can understand that it is really computing but is it computing all in memory? if it is, which seems to be the case, why does it take ages to calculate? and yes IsUsed is indexed.

SuheylZ avatar May 01 '19 05:05 SuheylZ

I think this will be good.

var unusedCount = collection.Count(Query.EQ("IsUsed", false));

or maybe

var unusedCount = collection.Count(x=> x.IsUsed == false);

lidanger avatar May 01 '19 05:05 lidanger

tried all that you are suggesting. to no avail

SuheylZ avatar May 01 '19 05:05 SuheylZ

i did a workaround that is kinda works for now but that is not the solution.I'm keeping track of all the operations that alter the Is Used and keeping count in separate table

but that is very hard to maintain

SuheylZ avatar May 01 '19 06:05 SuheylZ

I tried with LiteDB.Shell on a data file with size 12219088 KB and a collection with item size 1001479104 around.

db.xxxx.count date > "2015-01-01"

It ended in 7 seconds. And memory was taken 800MB aroud.Maybe actualy it just is a fact your collection is tool big.

I'm sorry, just now the time it costed is wrong.

@MercedeX

lidanger avatar May 01 '19 06:05 lidanger

its just not working in my case. i think i should record a video to demonstrate what I'm saying.

or maybe a sample program perhaps?


From: LiJundang [email protected] Sent: Wednesday, May 1, 2019 11:12:04 AM To: mbdavid/LiteDB Cc: MercedeX; Author Subject: Re: [mbdavid/LiteDB] If a table has large number of records, running a count query takes minutes to execute (#1214)

I tried with LiteDB.Shell on a data file with size 12219088 KB and a collection with item size 1001479104 around.

db.xxxx.count date > "2015-01-01"

It ended in 1 seconds.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/mbdavid/LiteDB/issues/1214#issuecomment-488217950, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADIXIO52OTD72FGEEN2L3ATPTEYDJANCNFSM4HJJZDGQ.

SuheylZ avatar May 01 '19 07:05 SuheylZ

It ended in 7 seconds. And memory was taken 800MB aroud.Maybe actualy it just is a fact your collection is tool big. I'm sorry, just now the item count and the time it costed is wrong.

@MercedeX

lidanger avatar May 01 '19 10:05 lidanger

@MercedeX Could you provide us with a bit more information about the environment in which LiteDB is running? Eg:

  • Is it a mobile device or not
  • CPU/Ram
  • which runtime and version are being used (.NET Framework, .NET Core, Mono?)

JensSchadron avatar May 02 '19 06:05 JensSchadron

dual core i7 6 GB ram Windows Server 2016 NTFS 512 Sector size dotnetcore 2.1 Console based app

please keep in mind that in future we will move to Linux too.

SuheylZ avatar May 02 '19 14:05 SuheylZ

@MercedeX Sorry for the late reply, but I feel like you might have hit the limits of what LiteDB can given it's architecture. 10 billion is an insanely large number and if, for example, 1 dbItem is 1 byte in size... that would mean that you have a collection of roughly 1 petabyte. Not sure if it's doable in your case, but you might be able to achieve a more performant collection by adjusting your models and thus reducing the total amount of documents in the collection?

I hope that the next-gen LiteDB version (which the beta version is planned to be released at the end of this month) will be more able to cope with such large amounts of documents.

JensSchadron avatar May 06 '19 21:05 JensSchadron

@Mercedex Sorry for the late reply, but I feel like you might have hit the limits of what LiteDB can given it's architecture. 10 billion is an insanely large number and if, for example, 1 dbItem is 1 byte in size... that would mean that you have a collection of roughly 1 petabyte. Not sure if it's doable in your case, but you might be able to achieve a more performant collection by adjusting your models and thus reducing the total amount of documents in the collection?

I hope that the next-gen LiteDB version (which the beta version is planned to be released at the end of this month) will be more able to cope with such large amounts of documents.

the count() has some bug,i think.because is looks load all data into memory to calculate the number

sgf avatar Jul 28 '23 09:07 sgf