tocc icon indicating copy to clipboard operation
tocc copied to clipboard

Implementing RegEx function in Database Exprs.

Open aidin36 opened this issue 11 years ago • 21 comments

Implement a RegEx class which is derived from FunctionExpr in database/expr/

When it compiles, it returns a Jx9 code that calls a C function. The C function will match a string with a RegEx pattern.

aidin36 avatar Feb 07 '14 03:02 aidin36

What does it do if the compile fails due to an invalid regex?

DThiebaud avatar Oct 22 '14 00:10 DThiebaud

It should throw an exception which tells the client what was wrong with the regex.

aidin36 avatar Oct 22 '14 03:10 aidin36

Experimenting with the Posix regex library, the library returns an error message on an invalid regex so this should be feasible.

DThiebaud avatar Nov 23 '14 02:11 DThiebaud

Thank Dick (: I didn't know that Posix have a Regex library. That would be great!

Should I assign the issue to you? Are you going to work on this?

aidin36 avatar Nov 25 '14 05:11 aidin36

I will word on this, but I'll be pretty busy until mid-December.

Where does an expression get compiled?

Would the generated Jx9 code compare one string to the regex and return true or false?

DThiebaud avatar Nov 25 '14 18:11 DThiebaud

We should add Regex as a function to Jx9. It should be function that take a string and a Regex pattern, and return true or false. Then we can use it in our queries. Calling an external function from Jx9 is explained in the Unqlite documents.

aidin36 avatar Nov 27 '14 04:11 aidin36

Aiden, please assign this task to me.

DThiebaud avatar Dec 24 '14 03:12 DThiebaud

Yours now (:

aidin36 avatar Dec 25 '14 13:12 aidin36

Aidin, it looks like class RegularExpr should be a subclass of Expr and should be allowed in a ConnectiveExpr. Do you agree?

Am I correct that the scope of this task is to define a regular expression type in libtocc and not to define an interface for it in cli?

DThiebaud avatar Dec 26 '14 05:12 DThiebaud

Yes. The second question is true. But first one is not.

Note that, before anything, we need a C function, that we can call from inside of a Jx9 script. After it is done, we simply drive a class from FunctionExpr (not the Expr itself) which calls that function.

Take a look at WildCardExpr. We need something like that.

aidin36 avatar Dec 26 '14 14:12 aidin36

Aidin, it appears that the constructor of RegexExpr will need, as a parameter, a pointer to the Unqlite VM. Is there any problem with this?

DThiebaud avatar Dec 30 '14 03:12 DThiebaud

Aidin, I see two possible ways of doing this.

  1. Compile regex in the constructor for the RegexExpr object and keep the compiled regex in the RegexExpr object. Register a pointer to the compiled regex as a resource variable in the Unqlite VM. Pass a pointer to the compiled regex to the C++ function called from Jx9.

    This way, if the regex match function is called for 20 records from Jx9, the regex is only compiled once. However, this requires that a pointer to the Unqlite VM be passed to the RegexExpr::RegexExpr constructor from CLI or whatever calls libtocc.

  2. Compile the regex in the function called from Jx9. This way, no pointer to the Unqlite VM needs to be passed to the RegexExpr::RegexExpr constructor. However, if the regex match function is called for 20 records from Jx9, the regex is compiled 20 times, once for each record.

    Thoughts? Do you prefer one way or the other?

DThiebaud avatar Dec 30 '14 15:12 DThiebaud

Let me see...

There's a third way! Have two function available to Jx9:

CompiledRegex* compile_regex(const char* regex);
bool match_regex(CompiledRegex* regex, const char* str_to_match)

Then, Jx9 first calls the first function and get a pointer to a compiled regex. Then, for each 20 records, passes the compiled regex to the second function.

What do you think?

aidin36 avatar Dec 30 '14 16:12 aidin36

If Jx9 calls the first function, where will the Regex that compile_regex compile reside? If is in the stack of of compiled_regex, the pointer might be invalidated by the time match_regex is called. It could be in a static field in compile_regex, but only if no more than one regex will ever be active. I think what we need to do is have compile_regex create the regex on the heap with malloc. We will need a Jx9 function free_regex(CompiledRegex* regex) which will call regfree and then free the memory that was malloced. This will work.

I've also thought of another possible way. Have Jx9 function:

bool match_regex(const char* compiled_regex_address, const char* str_to_match)

where compiled_regex_address is a string representing the address of the regex. When we compile the regex in RegexExpr::RegexExpr, we will convert its to address to a string and then put it in this->protected_data->arg. FunctionExpr::Compile put this string into the first argument of the Jx9 call to match_regex. (This is what happens for will convert the address to a string and put this in the J9X string to call the match_regex. (This will work like FunctionExpr::Compile does for WildCardExpr. The C++ code called by match_regex will receive the string, convert it to the address of the regex, and do the match.

static regex_t *string_to_regex_pointer(const char *string) { regex_t *regex_pointer; sscanf(string, "%p", &regex_pointer); return regex_pointer; }

static void regex_pointer_to_string(regex_t *regex_pointer, char *string, size_t string_length) { if (string_length < 20) { throw InvalidArgumentError("string less than 20 characters long passed to RegexExpr::regex_pointer_to_string"); } snprintf (string, string_length, "%p", regex_pointer); }

What are your thoughts about all this?

DThiebaud avatar Jan 02 '15 02:01 DThiebaud

Creative Idea! Though the code will become a little dirty. I couldn't came up with a cleaner idea. So, give it a try! I'm waiting for your Pull Request (:

aidin36 avatar Jan 02 '15 14:01 aidin36

On 01/02/2015 09:30 AM, Aidin Gharibnavaz wrote:

Creative Idea! Though the code will become a little dirty. I couldn't came up with a cleaner idea. So, give it a try! I'm waiting for your Pull Request (:

— Reply to this email directly or view it on GitHub https://github.com/aidin36/tocc/issues/14#issuecomment-68529878.

I should have the code fairly soon, but creating the test cases will take longer.

DThiebaud avatar Jan 02 '15 17:01 DThiebaud

How do a add a new file to libtocc/tests, regex_tests.hpp?

DThiebaud avatar Jan 02 '15 23:01 DThiebaud

Create a cpp file for the test. And add it to Makefile.am under the libtocc/tests/src/ directory. That should be enougth.

aidin36 avatar Jan 03 '15 13:01 aidin36

I'm having a problem running libtocc/tests/configure. I get the following error:

configure: error: Could not find libtocc library. Please make sure you have this library in your libs path. Refer to documentations for more info.

I built libtocc and ran "sudo make install" on it successfully so the library should not be missing.

DThiebaud avatar Jan 06 '15 04:01 DThiebaud

We can use one of two libraries: regex or pcre.

Pcre (Perl Compatible Regular Expressions) uses a format of regular expression compatible with Perl, Python, PHP, Java, and other packages. It seems to be the most commonly used regular expression library. It requires an external library to be linked in, the same way we link in Unqlite. It is available in MS Windows if we ever port TOCC to Windows.

Regex is the posix regular expression library. Its format is compatible with egrep and not compatible with PCRE. In Unix-like OS's, no additional library needs to be linked in. Regex is not available for Windows.

Which should we use?

DThiebaud avatar Jan 06 '15 09:01 DThiebaud

  1. It looks for libtocc.pc. It should be in /usr/local/lib/pkgconfig/libtocc.pc. If you couldn't fix your problem, please ask in malining list. (Others may have the answer, and we keep this issue clean from not-related talks (: )

  2. Personally, I'm more comfortable with PCRE regex. And I think most of the people do. Though we will depend on another library, I think we should prefer PCRE. At first, when I told you to use Posix Regex, I didn't know it's not compatible with Perl Regex.

aidin36 avatar Jan 07 '15 17:01 aidin36