moin icon indicating copy to clipboard operation
moin copied to clipboard

Switch to Sygil-Dev/whoosh-reloaded

Open UlrichB22 opened this issue 1 year ago • 19 comments

We use whoosh for indexing. The currently used package https://github.com/mchaput/whoosh is no longer maintained. There is a successor at https://github.com/Sygil-Dev/whoosh-reloaded.

UlrichB22 avatar Mar 12 '24 10:03 UlrichB22

Yes, whoosh-reloaded definitely looks more alive.

Maybe that also fixes the warnings seen about whoosh using "is" instead of "==".

ThomasWaldmann avatar Apr 04 '24 18:04 ThomasWaldmann

Some tests are failing with whoosh-reloaded 2.7.5. Following command gives an error:

pytest -k test_index_update src/moin/storage/middleware/_tests/test_indexing.py

======================================================================================== ERRORS =========================================================================================
_____________________________________________________________ ERROR at teardown of TestIndexingMiddleware.test_index_update _____________________________________________________________

cfg = <class 'moin._tests.wikiconfig.Config'>

    @pytest.fixture
    def app_ctx(cfg):
        namespace_mapping, backend_mapping, acl_mapping = create_simple_mapping(
            "stores:memory:",
            cfg.default_acl
        )
        [...] 
        before_wiki()
    
        yield app, ctx
    
        teardown_wiki('')
        ctx.pop()
        try:
            # simulate ERROR PermissionError:
            # [WinError 32] The process cannot access the file because it is being used by another process
>           assert [] == get_open_wiki_files()
E           AssertionError: assert [] == [popenfile(pa...flags=557056)]
E             
E             Right contains 5 more items, first extra item: popenfile(path='/home/moindev/moin/src/moin/_tests/wiki/index/all_revs_xww72entzpa6xbrs.seg', fd=18, position=14490, mode='r', flags=557056)
E             Use -v to get more diff

src/moin/conftest.py:76: AssertionError
---------------------------------------------------------------------------------- Captured log setup -----------------------------------------------------------------------------------
DEBUG    passlib.registry:registry.py:296 registered 'sha512_crypt' handler: <class 'passlib.handlers.sha2_crypt.sha512_crypt'>
------------------------------------------------------------------------------- Captured stdout teardown --------------------------------------------------------------------------------
open wiki popenfile(path='/home/moindev/moin/src/moin/_tests/wiki/index/all_revs_xww72entzpa6xbrs.seg', fd=18, position=14490, mode='r', flags=557056)
open wiki popenfile(path='/home/moindev/moin/src/moin/_tests/wiki/index/all_revs_5kyi50kvnnqlsouv.seg', fd=19, position=11728, mode='r', flags=557056)
open wiki popenfile(path='/home/moindev/moin/src/moin/_tests/wiki/index/latest_revs_c93258ojfx0q51ng.seg', fd=20, position=37513, mode='r', flags=557056)
open wiki popenfile(path='/home/moindev/moin/src/moin/_tests/wiki/index/latest_revs_c93258ojfx0q51ng.seg', fd=21, position=37513, mode='r', flags=557056)
open wiki popenfile(path='/home/moindev/moin/src/moin/_tests/wiki/index/latest_revs_wrjovyaf9q1yy3n4.seg', fd=22, position=22249, mode='r', flags=557056)
================================================================================ short test summary info ================================================================================
ERROR src/moin/storage/middleware/_tests/test_indexing.py::TestIndexingMiddleware::test_index_update - AssertionError: assert [] == [popenfile(pa...flags=557056)]
======================================================================= 1 passed, 22 deselected, 1 error in 0.78s =======================================================================

The index is not closed as before with whoosh 2.7.4. Is this important or can we just remove the failing assert statement? Maybe it is related to this old PR https://github.com/mchaput/whoosh/pull/15 that has been merged into whoosh-reloaded: https://github.com/Sygil-Dev/whoosh-reloaded/pull/8 .

UlrichB22 avatar Apr 05 '24 20:04 UlrichB22

Check if the amount of open files stays at some level or is always increasing.

ThomasWaldmann avatar Apr 05 '24 21:04 ThomasWaldmann

Did some experiments:

@@ -73,9 +73,10 @@ def app_ctx(cfg):
     create_app_ext()  # calls: init_backends(), opens indexes
     ...
     try:
         # simulate ERROR PermissionError:
         # [WinError 32] The process cannot access the file because it is being used by another process
-        assert [] == get_open_wiki_files()  # FAILS!
+        assert len(get_open_wiki_files()) < 6  # doesn't fail
     finally:
         destroy_app(app)  # calls: deinit_backends(), which closes the indexes
+        assert len(get_open_wiki_files()) < 1  # doesn't fail 

@RogerHaase Can you remember why you added this?

It looks like the original first assert is expecting too much (the backend is not deinitialised yet).

The second assert (which I added to check) succeeds, so everything is closed correctly there.

ThomasWaldmann avatar Apr 05 '24 22:04 ThomasWaldmann

No, don't remember. In general, Windows has similar non-repeatable issues with files left open.

I am thinking that def get_open_wiki_files(): or similar may help with #1626 which occurs every 6 months or so.

RogerHaase avatar Apr 06 '24 14:04 RogerHaase

@bylsmad do you remember why the above was added?

RogerHaase avatar Apr 06 '24 14:04 RogerHaase

I think the check for open files might be useful, but it should be done at that 2nd place, when we can really expect index files to be closed.

ThomasWaldmann avatar Apr 06 '24 14:04 ThomasWaldmann

Now struggling with index files left open by whoosh, but I have the failure reproducible on Linux now so hoping to have a clean test run soon

ERROR src/moin/_tests/test_user.py::TestUser::test_recovery_token - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'D:\\a\\moin\\moin\\src\\moin\\_tests\\wiki\\index\\all_revs_8zhbn7j04fuf7rlb.seg'

latest test fail: https://github.com/bylsmad/moin/actions/runs/5102359184/jobs/9171871358

Originally posted by @bylsmad in https://github.com/moinwiki/moin/issues/1455#issuecomment-1566208510


The old logs are not accessible any more. For me it looks like the destroy_app(app) call failed with WinError 32 and the assert has been added to show the filenames for debugging. I have no Win environment to test.

UlrichB22 avatar Apr 06 '24 15:04 UlrichB22

the check for the open files is there because the test tear down deletes all the files in the test wiki

the tear down was intermittently failing on windows because of open file handles, windows does not allow deletion of the index files when there are open file handles

the assertion was added to allow me to test this issue on Linux by raising an AssertionError which mirrors the windows PermissionError for attempt to delete the index when there are open file handles

bottom line is the test will fail on windows if there remain open file handles when you reach teardown, I found that adding a gc.collect in store_revision was the actual fix to prevent whoosh from holding the index files handles, this worked because the call to close on the file handles was inside some whoosh object's __del__ method

bylsmad avatar Apr 07 '24 11:04 bylsmad

Guess the "tear down" should not delete the whoosh index files as long as they could be open. Maybe something there is in wrong order, they are closed in destroy_app.

ThomasWaldmann avatar Apr 07 '24 17:04 ThomasWaldmann

I have found that removing gc.collect() in indexing.py fixes the issue.

gc.collect() affects performance and IMO should not be used in every store_revision() call. I have moved it to conftest.py in case of a PermissionError.

Please test above PR in a Win environment.

UlrichB22 avatar Apr 09 '24 21:04 UlrichB22

Created new wiki with help files, updated an item, viewed indexes, history, all ok, but 2 failing tests. See attached files: m-tox.txt

RogerHaase avatar Apr 12 '24 20:04 RogerHaase

Thanks for testing. Can you do two more tests with pytest src/moin/storage/middleware/_tests/test_indexing.py?

  1. with whoosh-reloaded and storage/middleware/indexing.py from master branch including gc.collect()
  2. with whoosh 2.7.4

When it runs successful on whoosh 2.7.4 we can raise an issue for whoosh.reloaded. Maybe we can ask for an option for close() to be a true close without caching the connection.

The cli move-index function is executed in a separate process, so that the situation differs from that of the test module, where it is executed in the same process as the rebuild and update.

UlrichB22 avatar Apr 13 '24 16:04 UlrichB22

Master branch tests, no errors on 274, 1 failure on whoosh-reloaded:

whoosh-274.txt whoosh-reloaded.txt

RogerHaase avatar Apr 13 '24 17:04 RogerHaase

There has been no human activity in the whoosh-reloaded repo in the last 6 months. A pull request was automatically closed after 60 days of no activity. I've spent a lot of time testing and always run into the locking/timeout issue with locust. I'm giving up at the moment. We should remove the issue from the 2.0.0b1 milestone.

UlrichB22 avatar Aug 01 '24 15:08 UlrichB22