flask-msearch icon indicating copy to clipboard operation
flask-msearch copied to clipboard

Update index too slow

Open boh5 opened this issue 4 years ago • 2 comments

I use whoosh backend with jieba analyzer. And use sqlalchemy link to mysql. I use flask-apscheduler to perform update whoosh index tasks regularly. Code like blow:

@scheduler.task('cron', id='do_refresh_whoosh_index', hour=2)
def refresh_whoosh_index():
    app = scheduler.app
    with app.app_context():
        search.update_index()

The first time to update index is very quick( no index file yet), thousands rows per second. But when update_index run again, it will be very slow, about just 10 rows per second. I have to delete the index, and recreate. Is there another solution? Or I made some mistake? Thanks!

boh5 avatar Apr 26 '20 14:04 boh5

flask-msearch would update index automatically after rows have been created or updated, you shouldn't do it manually, search.update_index() always update all rows ranther than new rows.

If you want to update all index manually, you should disable MSEARCH_ENABLE, and increase the size of yield_per

honmaple avatar Apr 30 '20 01:04 honmaple

Because I update some rows of my database table in other application per day, I have to update the index manually per day. ( Is that right? database update in other applications, flask_msearch can not update index automatically.) Now the problem is that delete_index() and update_index() too slow. And increasing yield_per not work, because the cpu limit. So, I have to delete all index file manually and create index again. create_index() is thousands times faster than update_index() and delete_index(). Is there a problem with the algorithm? Thanks!

boh5 avatar Apr 30 '20 04:04 boh5