rtabmap Merge db file meets error

Hello,

I have read #39 , however, when use Mem/ReduceGraph=true, log still print:

[ERROR] (2023-02-24 14:07:35.182) DBDriverSqlite3.cpp:3763::loadWordsQuery() Query (308) doesn't match loaded words (306)
[ERROR] (2023-02-24 14:07:35.228) VWDictionary.cpp:741::addWordRef() Not found word 2746 (dict size=30382)
[ERROR] (2023-02-24 14:07:35.228) VWDictionary.cpp:741::addWordRef() Not found word 3993 (dict size=30382)

even add Mem/NotLinkedNodesKept=false

if remove Mem/ReduceGraph=true, everything is ok

This is my command ./rtabmap-reprocess --Mem/ReduceGraph true --Mem/NotLinkedNodesKept false --Mem/RehearsalIdUpdatedToNewOne true "~/map_2.db;~/map_3.db" ~/map_23_new4.db

my question is,

how can I handle this ERROR?
if I ignore these ERROR, is it something wrong with db file? because I can still got pgm and pcd files.

Thank you~

Feb 24 '23 06:02 fjlxb

It seems that issue https://github.com/introlab/rtabmap/issues/39 would have been fixed since 2015 as Mem/ReduceGraph=true was used for one of my paper last year: https://github.com/introlab/rtabmap/blob/e6e6630d544376d1de39dd0d23c57ed8e414ea98/archive/2022-IlluminationInvariant/scripts/run_merge.sh#L32

Based on the same dataset, I tried these following variants:

rtabmap-reprocess --Mem/ReduceGraph false "map_190321-164651.db;map_190321-175428.db" merged.db

rtabmap-reprocess --Mem/ReduceGraph true "map_190321-164651.db;map_190321-175428.db" merged_and_reduced.db

rtabmap-reprocess --Mem/ReduceGraph true --Mem/NotLinkedNodesKept false "map_190321-164651.db;map_190321-175428.db" merged_and_reduced_not_kept.db

rtabmap-reprocess --Mem/ReduceGraph false --Mem/NotLinkedNodesKept false "map_190321-164651.db;map_190321-175428.db" merged_not_kept.db

Is the error happening during rtabmap-reprocess or when re-opening the database? I cannot reproduce the error either ways... Screenshot from 2023-02-25 16-28-01

EDIT: If you can share the two databases, I could give a try here.

Feb 26 '23 00:02 matlabbe

The issue still exists, but it's a bit difficult to locate. I enabled DEBUG logging and found that this issue was caused when some nodes were reactivated from LTM. What is a little strange is that the id of the node retrieved is very early, but the words that are not found seem to be created by the nodes after it. I will upload the database and log later.

Apr 07 '24 10:04 borongyuan

When signatures are brought back to working memory, their features are re-matched against the current state of the vocabulary. If the node is pushed back again to LTM and then brought back a second time to WM, the word ids may be higher than the original ones when the node has been created.

Apr 07 '24 21:04 matlabbe

https://drive.google.com/file/d/13rCZpSSymEUZl4afsQkxdZdIBg1dKsw9/view?usp=drive_link I'm not sure if this database is sufficient for debugging this issue. The process stalled when the following message appeared.

[ERROR] (2024-04-15 16:46:32.460) DBDriverSqlite3.cpp:3763::loadWordsQuery() Query (429) doesn't match loaded words (427)
[ERROR] (2024-04-15 16:46:32.476) VWDictionary.cpp:741::addWordRef() Not found word 9554 (dict size=15237)
[ERROR] (2024-04-15 16:46:32.476) VWDictionary.cpp:741::addWordRef() Not found word 9630 (dict size=15237)

I enabled Mem/ReduceGraph and reduced Rtabmap/MemoryThr to 50 to trigger retrieval more easily.

Apr 15 '24 09:04 borongyuan

Thanks for the database. I can reproduce the issue.

rtabmap-reprocess --Mem/ReduceGraph true --Rtabmap/MemoryThr 50  ~/Downloads/240415-164828.db output.db

I think I know what is going on. Here is a sequence that would reproduce the issue.

At some point, Node 31 is created with a new word 9950
Later, Node 65 is created with descriptor matching 9950 and loop with Node 4, so Node 65 will be removed when it reaches WM as Mem/ReduceGraph=true in Mem/STMSize (assuming 10 in this example) updates
At the update for node 71, the memory management is triggered (Nodes in memory > Rtabmap/MemoryThr), and Node 31 is one of the nodes transferred from WM to LTM. As its word 9950 is still referenced by node 65, word 9950 is not saved now in the database.
At update for Node 75, Node 65 is reduced (STM->WM) to Node 4 of the previous loop closure. When a node is reduced, if the words of that node are not referenced by any other Node [in WM], the words are deleted directly (not saved to database). In this case word 9950 is deleted directly.
At update for a later node, Retrieval is triggered on nodes around Node 30, so Node 31 is selected to be brought back from LTM to WM. When reactivating the words from Node 31, the word 9950 is neither in the database, neither in current dictionary state, thus errors like above happen!

The flaw is in step 4, we wrongly assume that "if a word is not referenced anymore by a node in current WM, we don't need to save it to database". This assumption works only if the node we remove is the latest one we just added to memory (e.g., when the robot is not moving, we delete the latest node by RGBD/LinearUpdate and RGBD/AngularUpdate parameters). I doublechecked when we delete nodes for Rehearsal and it still work as the old words are kept even if we update to new ID.

The easy fix would be to save all words to database of a signature deleted from graph reduction approach. This will go against the idea of reducing the graph to avoid increasing the database size over time. For this paper though, I didn't have this issue with both memory management and graph reduction enabled. I'll need to check deeper and see if I can reproduce with the data from that paper, if so, maybe compare the code between then and now.

From a quick look at the commits in the graph reduction section,

https://github.com/introlab/rtabmap/commit/e887d462ce9d0610d75de351b1abf1a5de3125cb (minor change)
https://github.com/introlab/rtabmap/commit/c7b84c60bcdf3b59ad0a85378ac9cdc45c249eaf (only links related changes)
https://github.com/introlab/rtabmap/commit/933ac736f14d49bd8eea0234f6b9504f99ef99a7 (that could be the one!)

I can see that change from the last commit above could cause the current issue and why it didn't appear in the original version:

- this->moveToTrash(s, _notLinkedNodesKeptInDb);
+ this->moveToTrash(s, false);

I guess I would have the same error with old code version if the parameter Mem/NotLinkedNodesKept was set false. When true (default), all visual words are kept in database, following the "easy fix" solution above. I will set it back to true for now to avoid the error, but keep in mind that visual words size could grow less in the database if we do something more sophisticated to delete words only if they are not referenced in current WM AND if they are not referenced in LTM by some older transferred node!

EDIT: To add to my other comment above, the bug didn't appear in the 2022 paper because the memory management was not used at the same time, only graph reduction was used in the comparison (in contrast to previous papers, the goal of that paper wasn't about computation time issue). In that case, if a word was not referenced by another node in WM, it means it can be safely deleted (as all nodes not in WM would never be brought back anyway).

Apr 16 '24 03:04 matlabbe

Thanks for the clear explanation.

Apr 17 '24 09:04 borongyuan

There is an _oldReferences map defined in VisualWord.h. But it seems to have never been used. https://github.com/introlab/rtabmap/blob/27dea71398b67151eb8d073dff3e68bcf974d2b4/corelib/include/rtabmap/core/VisualWord.h#L61-L63 What was it intended for? Can we use it to store LTM references?

Apr 18 '24 08:04 borongyuan

_oldReferences appeared when the project has been transferred from google code to github in 2011. I don't have any history of when/why it has been added before 2011. It seems never been used afterwards.

I would rater making a SQL query to LTM if needed than keeping track of all LTM references in RAM for each word. However, if SQL queries are two slow, we may update _oldReferences instead (assuming that for each word, there are not a lot of signatures containing it, so _oldReferences could be small). I created a new follow-up issue https://github.com/introlab/rtabmap/issues/1270

Apr 27 '24 22:04 matlabbe