tinydb icon indicating copy to clipboard operation
tinydb copied to clipboard

Some problems in advanced Query

Open IMXENON opened this issue 2 years ago • 3 comments

Hi, I'm recently using TinyDB as an interface for extracting some objects that including keyword in my json file. But seems like the method "matches" can't match the pattern after EOL in the first paragraph of document. E.g.: """   | "It is mentioned in the install instruction that this is work in progress.\nWhile at ICCV I quickly implemented a branch where I remplace matrix operations with Eigen3 calls, and random generators by Boost::random generators.\nI hope this is not redundant with ongoing work on private branches.\n\nThe branch can be found at\nhttps://github.com/rodrigob/caffe\n\nI got things to compile, however I noticed that some tests fails (thanks for creating a non-trivial set of unit tests !).\nI have not been able to compile a version with MKL to compare, but I can only assume that tests should not fail.\n\nCurrent fails are\n\n[ FAILED ] FlattenLayerTest/1.TestCPUGradient, where TypeParam = double\n[ FAILED ] StochasticPoolingLayerTest/0.TestGradientGPU, where TypeParam = float\n[ FAILED ] StochasticPoolingLayerTest/1.TestGradientGPU, where TypeParam = double\n[ FAILED ] MultinomialLogisticLossLayerTest/1.TestGradientCPU, where TypeParam = double\n\nwhich all sounds nasty (gradient computation errors in neural networks, big no no).\n\nI will spend some time inspecting to see what goes wrong there, but any suggestion/comment/idea is welcome.\n" """ I can't get the result from query that using the "matches" method by the pattern ".?nework.?" in this document.

IMXENON avatar Jun 28 '22 17:06 IMXENON

sorry, the pattern is "*?network.*?"

IMXENON avatar Jun 28 '22 17:06 IMXENON

Sorry again, I think I have solved this problem. I should use the "search" method The former code was:

        query = ".*?pattern.*?"
        query = r'' + query
        SearchCommentBody = self.db.matches(Qinstance.body.search(query))
        return SearchCommentBody

And the code

        query = ".*?pattern.*?"
        query = r'' + query
        SearchCommentBody = self.db.search(Qinstance.body.search(query))
        return SearchCommentBody

works after I switch the query method from matches to search.

I also have problem with the difference between these two methods, as the document described:

matches ==> Run a regex test against a dict value (whole string has to match). search ==> Run a regex test against a dict value (substring string has to match).

I can't really understand why the matches method only works before EOL

IMXENON avatar Jun 28 '22 17:06 IMXENON

Hey @ZealianMa,

I can't really understand why the matches method only works before EOL

This is because by default, Python's regex implementation considers . to NOT include newlines:

. (Dot.) In the default mode, this matches any character except a newline. If the DOTALL flag has been specified, this matches any character including a newline.

https://docs.python.org/3/library/re.html#regular-expression-syntax

So when you have a string that contains newlines and you use .*, the . will not match the newline and thus the whole string does not match. On the other hand, with search only a part of the string has to match so the regex engine can skip over all newlines until it finds a line that matches the regex.

To solve this, you can pass the re.DOTALL flag to the matches query:

import re

Qinstance.body.search(query, re.DOTALL)

Does that solve this issue for you?

msiemens avatar Jul 23 '22 19:07 msiemens

Hey @IMXENON, I take it from your "thumbs up" reaction that this issue is solved. If this is not the case, just leave a comment and I'll reopen this 🙂

msiemens avatar Nov 21 '22 07:11 msiemens