databroker icon indicating copy to clipboard operation
databroker copied to clipboard

fix memory issue with mongo_normalized when called by Tiled

Open JunAishima opened this issue 1 year ago • 0 comments

Memory usage never decreases when Tiled is used to generating datasets created from documents on beamlines configured to use databroker.mongo_normalized. On a test example with 47 datasets from rsoxs, memory usage increased by 10GB each time the test datasets were retrieved by a client, and never decreased until the Tiled server was shut down.

We would expect that at some point the memory usage by Tiled should stabilize.

This looks to be an issue with aggressive caching (lru_cache) in mongo_normalized.DatasetFromDocuments._inner_get_columns(). @danielballan has also suggested removing the decorator from mongo_normalized.DatasetFromDocuments._get_time_coord() as well.

Steps to Reproduce (for bugs)

  1. NSLS-II tiled config for rsoxs alone created, tiled serve config config.yml for that particular configuration.
  2. Client script developed to retrieve datasets and read() them. A set of 47 datasets is available upon request
  3. Run the client script repeatedly using the uuids, monitor with top. traces with memray also available with the current code. Script fragment:
from tiled.client import from_uri
c=from_uri('http://127.0.0.1:8000/api')
dataset_names # list of uuids compiled from a text file 
for dataset in dataset_names:
    c['rsoxs']['raw'][dataset]['primary']['data']['Small Angle CCD Detector_image'].read()

This problem has caused Tiled instances to crash in production, which is obviously not ideal for beamlines that rely on this for their processing and analysis procedures.

Tested in the following environment: Conda environment on RHEL 8.6 containing tiled-server 0.1.0a74, databroker 2.0.0b10, with python 3.9.13.

JunAishima avatar Oct 19 '22 23:10 JunAishima