fiftyone icon indicating copy to clipboard operation
fiftyone copied to clipboard

[FR] In memory storage option

Open benjaminpkane opened this issue 4 years ago • 4 comments

Proposal Summary

MongoDB offers an inMemory storage engine option for mongod. Perhaps there is a way to leverage this option to better support smaller, more "in and out" use cases for FiftyOne.

Motivation

Competitive ingestion performance when compared to popular in-memory backends (pandas, etc.)

What areas of FiftyOne does this feature affect?

  • [ ] App: FiftyOne application
  • [x] Core: Core fiftyone Python library
  • [ ] Server: FiftyOne server

Details

Needs design. It is not purely because FiftyOne uses a disk-backed mongod process that ingestion is "slow" at the moment, but this could certainly bring even more flexibility to FiftyOne.

benjaminpkane avatar Oct 23 '20 02:10 benjaminpkane

Nice, I ran across this today too - the main downside I'm seeing is the lack of persistence, so it would only really work for non-persistent datasets. There might be some additional caching options that could be used to speed up the existing disk-based storage.

lethosor avatar Oct 23 '20 02:10 lethosor

By setting the environment variable "FIFTYONE_DATABASE_DIR" to the tmpfs directory, the operation speed can be improved.

GracefulTabby avatar Dec 08 '21 07:12 GracefulTabby

Just stumbled upon this issue and I have a little bit related questions. So I wanted to use 51 only for evaluation, however in my special case (small dataset ~ 100 images, but annotations ~10 000 annotations + predictions) the conversion/import to a 51 dataset is taking some time (> 1 minute). Now I wonder how I can speedup the import.

I am not familiar with the 51 architecture, but apart from using e.g. an in-memory database, is it perhaps possible to somehow avoid storing the samples in the mongodb database at all (and perhaps avoid time for serilization)? In my case I do not need persistence anyhow.

constiDisch avatar Feb 20 '24 10:02 constiDisch

HI @constiDisch. Samples are in-memory until they are added to a dataset. A dataset is always backed by MongoDB. So with respect dataset or view operations, MongoDB is required

benjaminpkane avatar Feb 21 '24 15:02 benjaminpkane