tinydb icon indicating copy to clipboard operation
tinydb copied to clipboard

is there any way to speed up deletion

Open fenchu opened this issue 2 years ago • 1 comments

My tindydb json file grows with 5GB per week if I do not delete.

Currently we just load the tinydb data.json and delete all internalids below a given threshold.

But the major problem is that we need to close tinydb handler to do this, this do not work well in a multiprocessing asyncio fastapi app.

I like to keep max 1000 entries in the list and delete everything below the 1000 highest.

Any guidelines on how to do this while keeping the app running would be great.

A suggestion I got was adding a timestamps (epoch) and delete any timestamps below the 1000 highest, but it bloats up the table. and add extra logic. Thanks

fenchu avatar Oct 18 '23 10:10 fenchu

This can be obtained using db.max(), but it is slow

def keep_newest(key:str='jobid', maxlen:int=1000) -> Optional[List]:
    """ keep the newest maxlen entries in database """
    global db
    if not db:
        db = TinyDB(db_path)
    currlen = len(db.all())
    if currlen<=maxlen:
        #log.warning(f"database size is:{currlen} which is less than {maxlen} - no deletion")
        return False
    ids = []
    for d in db.all()[:currlen - maxlen + 1]:
        id = db.remove(where(key)==d[key])
        if id:
            ids.append(id)
        #log.info(f"removed {d} with index {id}")
    return ids

number of entries in database: 10000 number of entries in database: 999 deleting 9001 took 450.83sec

The json direct version is way faster: 1875 times faster?

def keep_newest_json(fname:str,  maxlen:int=1000, table:str='_default') -> Optional[List]:
    """ keep the newest maxlen entries in database """
    dat = None
    with open(fname, 'r', encoding='utf8') as FR:
        dat = json.load(FR)
    if table not in dat:
        log.fatal(f"table:{table} not found in dat:{list(dat.keys())}")
        return None
    currlen = len(dat[table].keys())
    if currlen<=maxlen:
        log.info(f"table:{table} has {currlen} entries, less than maxlen:{maxlen}")
        return None
    ids = []
    for id in list(dat[table].keys())[:currlen - maxlen + 1]:
        del dat[table][id]
        ids.append(id)
        #log.info(f"removed index {id} from {table}")
    with open(fname, 'w', encoding='utf8') as FW:
        FW.write(json.dumps(dat, indent=2, sort_keys=True))
    return ids

number of entries in database: 10000 number of entries in database: 999 deleting 9001 took 0.24sec

fenchu avatar Oct 18 '23 15:10 fenchu