automata
automata copied to clipboard
Feature Request: NFA <-> Regular Expression
Hello,
Thank you for your work and very nice library.
If ever you get the time, would it be possible to implement the conversion NFA <-> Regular expression ?
The direction Regex -> NFA is easy, the other one more annoying. A good reference is "Introduction to the Theory of Computation" by Sisper, chapter 1, definition 1.64 for the concept of "generalised nondeterministic finite automaton" which is useful in the conversion NFA -> Regex.
Thank you very much again, Tristan
@tcosmo This is a great idea. @YtvwlD, do you think this is something you'd be up to tackle?
I had the idea last year, but then decided that it probably would be too much work. :) I don't have any time this week but might look at this next week.
@tcosmo You mentioned that going from RE to NFA is easy. Can you describe how? I know one can do this using Thompson's construction but going from Python's RE module to an actual NFA is proving difficult.
Hi @jonthemango , the python RE module is, as far as theory is concerned, a bit a pain in the a**. Indeed it is useful for very practical cases but it doesnt care that much about the theory and doest dwell well with it.
I recommand using this package instead: https://github.com/cyphar/redone/tree/master/redone
T
Does anyone here still have an interest in this feature? My only concern is that even if it were feasible to implement, there would be countless edge cases that could cause the algorithm to fail. In addition, I suspect any final implementation might be too arbitrary or rigid.
I've already skimmed through this tutorial, and it seems the process is pretty involved. It's probably not something I would dive into head-on unless I had a bit more experience with these kinds of conversions.
I definitely still have an interest!
Very few automata library are bothered with this feature which is so important from a theoretical perspective.
I would be able to implement this feature.
T
On Sun 8 Mar 2020 at 01:30, Caleb Evans [email protected] wrote:
Does anyone here still have an interest in this feature? My only concern is that even if it were feasible to implement, there would be countless edge cases that could cause the algorithm to fail. In addition, I suspect any final implementation might be too arbitrary or rigid.
I've already skimmed through this tutorial https://www.youtube.com/watch?v=OKFrju0HB7k, and it seems the process is pretty involved. It's probably not something I would dive into head-on unless I had a bit more experience with these kinds of conversions.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/caleb531/automata/issues/13?email_source=notifications&email_token=AB2DNAGGISAXR67DVRW5ZVDRGLYMRA5CNFSM4IGFU6Y2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOEJ2VQ#issuecomment-596155734, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2DNAHXZK5DCZOATEW3KD3RGLYMRANCNFSM4IGFU6YQ .
@tcosmo Excellent! Well, if you mean you would be comfortable writing the Python code to implement this, then you are certainly welcome to. Just submit me a pull request when you are finished, with the necessary test cases as well to verify that the functionality works as expected in all scenarios.
Now that I think of it, I really need to write a CONTRIBUTING.md guide for this project. 😅
@tcosmo Oh, and as for the API design, I would prefer the method be called to_regex, under the NFA class. You are welcome to construct the functionality using multiple functions, as long as the name of each private function begins with an underscore (e.g. _compute_internal_stuff).
I ended up implementing regex -> NFA conversion as part of a project I'm working on -- is this feature request still open? @caleb531
@eohomegrownapps Yes, it's still open! Please fork the repository and submit me a Pull Request if you'd like to take a stab at working in your logic.
Please see my API notes below, as well as read through the CONTRIBUTING.md to make sure tests, etc. are included with your PR:
Oh, and as for the API design, I would prefer the method be called to_regex, under the NFA class. You are welcome to construct the functionality using multiple functions, as long as the name of each private function begins with an underscore (e.g. _compute_internal_stuff).
So I have implemented NFA/DFA to regular expression conversion indirectly by first converting NFA/DFA to generalized NFA, as @tcosmo mentioned. It so happened that I read the same book this summer as a project. However, the expressions generated every time are not unique but, yes, are equivalent to each other (I tested several samples here This is posing a problem in testing it, we need yet another module which verifies if two regular expressions are equivalent.
@abhinavsinha-adrino What do you mean when you say the regular expressions not unique every time? Does the conversion process not produce the same regular expression given the same DFA/NFA?
@caleb531 Not unique means the strings are not equal, but logically expressions are equivalent. The conversion algorithm consists of recursively removing states from the GNFA one by one, the choice of that state in each step results in different strings for the regular expression. At each step, I chose the state which is least connected in the machine, but again multiple such states may exist with equal connectivity.
@abhinavsinha-adrino I see. Well, if the algorithm for choosing among states with equal connectivity is a deterministic algorithm (i.e. always returns the same output given the same input), then I don't see why the resulting regex strings would be different.
Are you choosing from those equivalent states at random?
@caleb531 The thing is we are storing those states in sets, so choosing among them is random if we use min function, I can make it deterministic according to their name yes. But again for testing, we don't know what form of regular expression it is gonna output. Manually, by hand, we may be writing a simpler regular expression that may not match the output of code. One way would be to follow the same algorithm by hand to produce answers and then test the machine.
One example of the different strings for same regex produced by my code-
(b(ba*b)*a|a)*b(ba*b)* and a*b(aa*b|ba*b)*

@abhinavsinha-adrino Oh right, I forgot that this entire library is set-based. 🤦🏻♂️
I finally understand what you're saying—thank you for the explanation. That is,, our version of the expected output may not be string-equal to the actual output, even if they are regex-equivalent.
I think the solution to that (as I think you are implying) would be to compare the machine's output with our handwritten output through an online equivalence-checker. If they are equivalent, then use the machine's actual output as our expected output (because we just verified that they're equivalent, at least for now).
One downside to that is that it makes our tests more brittle, as any change in the implementation may require us to "pre-compute" our expected outputs all over again. But I think that is a trade-off I'm willing to make for the time being, at least until we incorporate a regex-equivalence engine into the project.
👍I will write test codes in coming days. I'll try to work on regex equivalence part too and see if it works out.
Just an update for everyone: I have merged @abhinavsinha-adrino's generous contributions in #46, and will be publishing v6 of this library relatively soon. However, if you don't want to wait, you can pull the latest codebase off of my new develop branch:
https://github.com/caleb531/automata/tree/develop
Great news, everyone! Automata v6 is finally out with an API for working with regular expressions. Please see the README for details:
https://pypi.org/project/automata-lib/
With that, I will be closing this issue now.