Is there any possible way to parse PostgreSQL style operators?
PostgreSQL has a very complex operator parsing rule:
- Any string consisting of
+ - * / < > = ~ ! @ # % ^ & | ` ?characters with a length of less than or equal to 63 is recognized as an operator, but does not contain--and/*, which are always recognized as comments. - Operators cannot end with
+and-unless it contains at least one~ ! @ # % ^ & | ` ?character.
I can't think of any way to handle comments that doesn't work the way I want, even adjusting the priority. Does anyone have any experience with this that I can refer to?
Hi, I am not familiar enough with SQL to answer your question, but can your operators be expressed with a regex? If so, let's start from that. Otherwise, you can always parse some SQL expression as multiple tokens, and validate the SQL expression manually (a bit like what is done with the JSON parser example).
Hi, I am not familiar enough with SQL to answer your question, but can your operators be expressed with a regex? If so, let's start from that. Otherwise, you can always parse some SQL expression as multiple tokens, and validate the SQL expression manually (a bit like what is done with the JSON parser example).
For example, I want +/-*!@/*&^%*/ to be interpreted as a +/-*!@ operator and a C-style comment /*&^%*/. I can't figure out how to express this rule in a regular expression.
Hi @buzzers, sorry for the delayed reply. Every time there is a possibility of having comments inside a valid expression, I recommend using callback. So the easiest thing is to create an enum with each variant being a valid SQL token, and one variant will be Comment that matches /*. If a /* is matched, then a callback function is called and tries to find the ending part of the comment (i.e., */). Regexes are not great at ignoring things inside their expression, so it is just easier to split the things into smaller tasks.
Hi @buzzers, sorry for the delayed reply. Every time there is a possibility of having comments inside a valid expression, I recommend using callback. So the easiest thing is to create an
enumwith each variant being a valid SQL token, and one variant will beCommentthat matches/*. If a/*is matched, then a callback function is called and tries to find the ending part of the comment (i.e.,*/). Regexes are not great at ignoring things inside their expression, so it is just easier to split the things into smaller tasks.
If you only process comments starting with /*, there is no problem. However, the core difficulty is that /* may be part of any multi-character operator, such as +-@/*++. At this time, it is difficult to handle because there is no fallback. If you want to handle it, you can only process all operator characters manually instead of using regular expressions.
If you have a multi-character operator, you can always have its start be parsed by one of the enum variants, and you use a callback to parse the characters themselves.