pest feat: support better error reporting

Motivation

See the issue and the comment.

Currently in case of parsing fail we sometimes get not very informative error messages (see issue comment for examples). This PR mostly introduces parser_state.rs and error.rs changes that allow to:

Store information about expected tokens
Store rules stack calls (from which user can later create custom error messages)

Main changes

Currently in case parent rule contains a sequence of rules/chars/strings (sensetive and insensetive) we only track the farthest rule. Pest store it and it's position in pos_attempts and attempt_pos but ignores information about successful or unsuccessful parse of strings and chars.

This MR mostly introduces additional logic inside of rule, match_stirng and match_insensetive function calls in order to support tracking of expected tokens.

Moreover logic (i'd say fake) of tracking rule calls is added. Imagine parsing have failed on position x of input string. E.g. we will have rule_1, rule_2 and rule_3 failed on the equally far position. Current MR also introduces logic of tracking most top parent of those rules. The idea is that with such (parent, deepest_failed_rule) pair we could generate more informative messages.

Summary by CodeRabbit

New Features
- Introduced a comprehensive SQL grammar parser for enhanced SQL command parsing capabilities.
- Added detailed error reporting to assist with SQL syntax and parsing errors.
Enhancements
- Improved internal parsing state management for better tracking of parsing attempts and errors.
Documentation
- Updated test cases to reflect new error handling features.

Jan 09 '24 10:01 EmirVildanov

Walkthrough

The changes encompass the introduction of an SQL parsing module with advanced error handling capabilities. These improvements enhance error diagnostics for SQL parsing, boosting the reliability and resilience of the parser.

Changes

File Path	Change Summary
`.../sql.pest`, `.../src/lib.rs`	Added SQL parsing module with error handling functions and updated test functions.
`pest/src/error.rs`	Enhanced error tracking with `parse_attempts` in `ParsingError` and updated test cases.
`pest/src/parser_state.rs`	Expanded types for tracking parsing attempts; `ParserState` updated with new fields and methods.

🐰✨ In the realm of code, a rabbit does play, Parsing SQL with precision, errors in dismay. Detailed diagnostics, a dance so grand, With each parsing leap, a better stand! 🎉 ✨🐰

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit-tests for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit tests for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit tests.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

Jan 09 '24 10:01 coderabbitai[bot]

@coderabbitai review

Jan 09 '24 12:01 tomtau

Update: it's been a long time since my last comment and I'm sorry. I currently try to optimize SQL expressions parsing using Pratt parser. After finishing this task, I hope to return to current MR.

P.S. Thanks for the comments!

Feb 08 '24 07:02 EmirVildanov

the SQL grammar looks ok, maybe it can be added in a separate PR?

the better error reporting change is good, but it'd be a semver-breaking change... not sure if there's a way to add it in a semver-compatible way? if not, maybe we can put it under a feature-guard?

@tomtau, I am currently tinkering changes I've previously suggested in this MR. The main stumbling block was that these changes were breaking semver. May you please consult me if solution below is more applicable and whether it's semver compatible?

Previously semver was broken because I've added new parse_attempts field in public ErrorVariant::ParsingError enum variant. If people were matching it with ErrorVariant::ParsingError { positives, negatives }, their code would be broken after update because of new field. What if I add PRIVATE parse_attempts: Option<ParseAttempts<R>> field into the Error struct and add new public method pub fn get_parse_attempts(&self) -> Option<ParseAttempts<R>> to it? It seems that any users' code wouldn't be broken after this patch. Seems like the only thing I'd have to fix is internal tests (e.g. in meta/src).

I've taken into account your comment about not making mod parser_state public and tested changes proposed above -- everything works good.

Mar 04 '24 10:03 EmirVildanov

What if I add PRIVATE parse_attempts: Option<ParseAttempts<R>> field into the Error struct and add new public method pub fn get_parse_attempts(&self) -> Option<ParseAttempts<R>> to it? It seems that any users' code wouldn't be broken after this patch. Seems like the only thing I'd have to fix is internal tests (e.g. in meta/src).

yes, I guess that could work

Mar 04 '24 11:03 tomtau

What if I add PRIVATE parse_attempts: Option<ParseAttempts> field into the Error struct and add new public method pub fn get_parse_attempts(&self) -> Option<ParseAttempts> to it? It seems that any users' code wouldn't be broken after this patch. Seems like the only thing I'd have to fix is internal tests (e.g. in meta/src).

yes, I guess that could work

Thanks for reply! I'll update this PR with latest changes in the coming days

Mar 04 '24 11:03 EmirVildanov

feat: support better error reporting

Motivation

Main changes

Summary by CodeRabbit

Walkthrough

Changes

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

CodeRabbit Discord Community

CodeRabbit Configration File (`.coderabbit.yaml`)