LiteDB
LiteDB copied to clipboard
If a table has large number of records, running a count query takes minutes to execute
I have the following item stored in LiteDB
internal class DBItem
{
[BsonId]
public long Id { get; set; }
public BigInteger Value { get; set; }
public DateTime CreatedOn { get; set; } = DateTime.UtcNow;
public bool IsUsed { get; set; } = false;
}
now when I execute the following line it never returns the number while the memory usage keeps shooting at around 4.7 GB.
var unusedCount = collection.Count(x=>!x.IsUsed);
please note that I have more than 10 billion records.
Is there a index on IsUsed? I have a data file larger than 25GB and single collection larger than 7GB, it works well.But I did not execute Count operation before.
well, try count() with an expression. it would almost bring the execution to halt while holding gigs of memory. i can understand that it is really computing but is it computing all in memory? if it is, which seems to be the case, why does it take ages to calculate? and yes IsUsed is indexed.
I think this will be good.
var unusedCount = collection.Count(Query.EQ("IsUsed", false));
or maybe
var unusedCount = collection.Count(x=> x.IsUsed == false);
tried all that you are suggesting. to no avail
i did a workaround that is kinda works for now but that is not the solution.I'm keeping track of all the operations that alter the Is Used and keeping count in separate table
but that is very hard to maintain
I tried with LiteDB.Shell on a data file with size 12219088 KB and a collection with item size 1001479104 around.
db.xxxx.count date > "2015-01-01"
It ended in 7 seconds. And memory was taken 800MB aroud.Maybe actualy it just is a fact your collection is tool big.
I'm sorry, just now the time it costed is wrong.
@MercedeX
its just not working in my case. i think i should record a video to demonstrate what I'm saying.
or maybe a sample program perhaps?
From: LiJundang [email protected] Sent: Wednesday, May 1, 2019 11:12:04 AM To: mbdavid/LiteDB Cc: MercedeX; Author Subject: Re: [mbdavid/LiteDB] If a table has large number of records, running a count query takes minutes to execute (#1214)
I tried with LiteDB.Shell on a data file with size 12219088 KB and a collection with item size 1001479104 around.
db.xxxx.count date > "2015-01-01"
It ended in 1 seconds.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/mbdavid/LiteDB/issues/1214#issuecomment-488217950, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADIXIO52OTD72FGEEN2L3ATPTEYDJANCNFSM4HJJZDGQ.
It ended in 7 seconds. And memory was taken 800MB aroud.Maybe actualy it just is a fact your collection is tool big. I'm sorry, just now the item count and the time it costed is wrong.
@MercedeX
@MercedeX Could you provide us with a bit more information about the environment in which LiteDB is running? Eg:
- Is it a mobile device or not
- CPU/Ram
- which runtime and version are being used (.NET Framework, .NET Core, Mono?)
dual core i7 6 GB ram Windows Server 2016 NTFS 512 Sector size dotnetcore 2.1 Console based app
please keep in mind that in future we will move to Linux too.
@MercedeX Sorry for the late reply, but I feel like you might have hit the limits of what LiteDB can given it's architecture. 10 billion is an insanely large number and if, for example, 1 dbItem is 1 byte in size... that would mean that you have a collection of roughly 1 petabyte. Not sure if it's doable in your case, but you might be able to achieve a more performant collection by adjusting your models and thus reducing the total amount of documents in the collection?
I hope that the next-gen LiteDB version (which the beta version is planned to be released at the end of this month) will be more able to cope with such large amounts of documents.
@Mercedex Sorry for the late reply, but I feel like you might have hit the limits of what LiteDB can given it's architecture. 10 billion is an insanely large number and if, for example, 1 dbItem is 1 byte in size... that would mean that you have a collection of roughly 1 petabyte. Not sure if it's doable in your case, but you might be able to achieve a more performant collection by adjusting your models and thus reducing the total amount of documents in the collection?
I hope that the next-gen LiteDB version (which the beta version is planned to be released at the end of this month) will be more able to cope with such large amounts of documents.
the count() has some bug,i think.because is looks load all data into memory to calculate the number