fiftyone
fiftyone copied to clipboard
[FR] In memory storage option
Proposal Summary
MongoDB offers an inMemory
storage engine option for mongod
. Perhaps there is a way to leverage this option to better support smaller, more "in and out" use cases for FiftyOne.
Motivation
Competitive ingestion performance when compared to popular in-memory backends (pandas, etc.)
What areas of FiftyOne does this feature affect?
- [ ] App: FiftyOne application
- [x] Core: Core
fiftyone
Python library - [ ] Server: FiftyOne server
Details
Needs design. It is not purely because FiftyOne uses a disk-backed mongod
process that ingestion is "slow" at the moment, but this could certainly bring even more flexibility to FiftyOne.
Nice, I ran across this today too - the main downside I'm seeing is the lack of persistence, so it would only really work for non-persistent datasets. There might be some additional caching options that could be used to speed up the existing disk-based storage.
By setting the environment variable "FIFTYONE_DATABASE_DIR" to the tmpfs directory, the operation speed can be improved.
Just stumbled upon this issue and I have a little bit related questions. So I wanted to use 51 only for evaluation, however in my special case (small dataset ~ 100 images, but annotations ~10 000 annotations + predictions) the conversion/import to a 51 dataset is taking some time (> 1 minute). Now I wonder how I can speedup the import.
I am not familiar with the 51 architecture, but apart from using e.g. an in-memory database, is it perhaps possible to somehow avoid storing the samples in the mongodb database at all (and perhaps avoid time for serilization)? In my case I do not need persistence anyhow.
HI @constiDisch. Samples are in-memory until they are added to a dataset. A dataset is always backed by MongoDB. So with respect dataset or view operations, MongoDB is required