sacred icon indicating copy to clipboard operation
sacred copied to clipboard

Storing (large) data produced by an experiment as an Artifact without first saving to a file

Open GiliR4t1qbit opened this issue 7 years ago • 2 comments

When running an experiment that results in a large amount of data that needs to be saved, the recommended way in the docs is to save that data as an Artifact. When using MongoDB, the reasoning is clear to me: the maximum per document ("row") is 16MB, and anyway putting too much data in a document could presumably lead to slow queries. My understanding is that the Artifact data is then broken down into lumps and stored in the MongoDB using GridFS.

Using an Artifact to store this data seems like a good solution, but it seems to require saving the data as a file first (?). Given that I have the data in memory, it seems inelegant and inefficient to first save it as a temporary file and then convert that into an entry in the database...? Perhaps I'm missing something, but it seems to me that it would be convenient to be able to create an Artifact from data that's in memory, and not in a file.

Alternatively, if there is another solution within Sacred or outside of it for this scenario, that would be very useful to know, thanks!

GiliR4t1qbit avatar Oct 13 '18 01:10 GiliR4t1qbit

Looks like the GridFS function put() actually supports passing 'data', which can be a file-like object or str/bytes (for Python 2.7/3). I'm suggesting that Sacred pass along this functionality, so that instead of only allowing adding Artifacts from files, one would be able to add artifacts from str/bytes.

GiliR4t1qbit avatar Oct 17 '18 05:10 GiliR4t1qbit

Hi! Yes, the current solution is to create a temporary file and then add that to the database. I agree that this is a bit inelegant and it would be nice to have a more direct way. I had mentally lumped this together with the support for live-artifacts (#81) that has been on my wishlist for a while. But your suggestion could probably be implemented more easily. Though we would need to adjust the other observers too, so that the interface behaves the same for each one.

Qwlouse avatar Oct 22 '18 08:10 Qwlouse