foorep icon indicating copy to clipboard operation
foorep copied to clipboard

Delete sample after insert

Open dkovar opened this issue 12 years ago • 5 comments

Good evening,

Have you considered modifying foorep to relocate all samples to its own filesystem? At the moment, it appears that it leaves the samples in place. Various other tools do this:

  1. Hash the sample
  2. Copy the sample to a filesystem dedicated to the tool, naming the sample based on the hash
  3. Do the database work, referencing the sample in the tool's filesystem.

Duplicates are detected at ingestion time and, if the sample has a name that is different than the existing sample, a record is created (or adjusted) to note the multiple sample names.

-David

dkovar avatar Jan 04 '13 23:01 dkovar

foorep is storing the samples in GridFS, a filesystem within mongoDB. It leaves the samples in place at the moment to, after the import but I will add a option to the CLI to remove it after.

berggren avatar Jan 07 '13 05:01 berggren

Greetings,

If you import 1TB of malware samples, does the database grow by 1TB? In other words, will GridFS scale well over time?

-David

dkovar avatar Jan 07 '13 23:01 dkovar

Yes, if you import 1TB data in GridFS that will grow the database by 1TB. The way GridFS works is by splitting the files over several "documents" in it's internal structure. I think that it will scale pretty well as you can add more database servers and shard the data, but I need to test this in real world first.

berggren avatar Jan 08 '13 07:01 berggren

Greetings,

I've got about 1.5TB of malware samples coming in this week. I'll get everything set up next week and will feed them all in and see what happens.

-David

dkovar avatar Jan 08 '13 13:01 dkovar

Interesting! Please report any issues you get. I will also do a similar test.

On 01/08/2013 02:56 PM, dkovar wrote:

Greetings,

I've got about 1.5TB of malware samples coming in this week. I'll get everything set up next week and will feed them all in and see what happens.

-David


Reply to this email directly or view it on GitHub: https://github.com/berggren/foorep/issues/3#issuecomment-11997803

berggren avatar Jan 08 '13 15:01 berggren