automata icon indicating copy to clipboard operation
automata copied to clipboard

Implement NFA features (e.g. input reading) for GNFA

Open caleb531 opened this issue 3 years ago • 9 comments

@abhinavsinha-adrino I'm starting to wonder if I misunderstood the purpose of GNFAs when I suggested that we subclass GNFA from NFA. Let me explain the context:

I was trying to write a test for GNFA just now to verify that GNFA.read_input_stepwise() operates as expected. However, I was encountering one error after another in trying to accomplish this. I then realized that, because GNFAs can have non-state values (i.e. regex strings and None) in the transition map, maybe it doesn't make sense to use methods like read_input_stepwise on a GNFA.

This then made me wonder if it was a mistake to subclass GNFA from NFA, since it seems like there could be a whole bunch of methods (like read_input_stepwise) that either would fail or just wouldn't make sense to have available on GNFA.

What do you think? Given what you know about Automata theory and the purpose of GNFAs, would it make sense to have functional methods like GNFA.read_input_stepwise? Would it make more sense semantically to subclass GNFA from FA instead of NFA?

caleb531 avatar Aug 29 '22 19:08 caleb531

@caleb531 From what I know, the only purpose of GNFA was to aid in the recursive conversion of NFA to regex. Adding functional methods like this will make the GNFA class consistent with other classes in this package, however, I feel it's not worth it because again it is not made for this purpose. Talking of sense, yes GNFA can read inputs in theory.

The way automaton.py in automata.base is written, it is necessary to define read_input_stepwise if we subclass GNFA from FA. (Subclassing it from NFA just doesn't raise the 'NotImplementedError', doesn't really solve the problem, though). Theoretically, GNFA should be inherited from NFA class only.

As for actually coding the function for reading inputs, the best way would be to convert the GNFA back to NFA. (Any other way that could accomplish this task will be some or the other version of this method because essentially, it's NFAs that can read inputs for a regex). For converting GNFA to NFA, just convert each regex to an NFA, then add epsilon transitions from the final states of the NFA to the to_state of GNFA and join the from_state of GNFA to the start state of the NFA. (to_state and from_state are any pair of GNFA states)

abhinavsinha-adrino avatar Aug 29 '22 21:08 abhinavsinha-adrino

@abhinavsinha-adrino I think I understand. Personally, I would prefer the primitives in this library to be consistent with the theory as much as possible, and if GNFA can theoretically read inpuT, then we should allow/implement that.

In order to implement this properly for GNFA, I think we would only need to make the following changes:

  • Move the _lambda_closure_dict creation to a dedicated method (like _precompute_lambda_closures) so it can be called within GNFA.__init__
  • Overwrite _check_for_input_rejection in GNFA to assume no more than one final state (rather than multiple final states as is the case with the regular NFA class)
  • Overwrite _get_next_current_states in GNFA to use the stdlib's re module for evaluating if the given regex transition matches the given transition string
    • I recognize that Python-native regexes allow for much more syntactically than automata regexes, but it seems like every automata regex is also a valid Python regex, so there should be sufficient interoperability)

I know you talked about converting a GNFA back to an NFA to read input, but that sounds computationally expensive to perform on every input read, and if there are more efficient means of reading input into a GNFA, that would be ideal. But please let me know your thoughts!

caleb531 avatar Aug 29 '22 22:08 caleb531

I won't support use of re package. In this automata package we have our own automata machinary, and every function performs it's task from the very basic theory. If we use re package to help in reading inputs then it would become theoretically inconsistent for GNFA. It is not good to use re module when we have our own regex module.

If you want to have the re package to see if the regex matches, then just convert that particular regex into an NFA and read input. (It will be better if we add a function in regex module for regex match). This will abide with the theoretical framework of the package till now.

As for the GNFA to NFA coversion, I didn't suggest converting GNFA to NFA on every input. We can have a NFA equivalent saved inside the GNFA class which we can use for reading inputs.

abhinavsinha-adrino avatar Aug 30 '22 11:08 abhinavsinha-adrino

@abhinavsinha-adrino Thank you for your candor and thoughtfulness in challenging my proposed approach. It shows you really care about the direction and quality of the library, and for that I am grateful. With that, you have convinced me that incorporating the stdlib re module into this project would be improper.

Question regarding your alternative approach: how could we cache an underlying NFA inside the GNFA class if the regex-matching logic is part of our automata.base.regex module? But maybe we don't need to cache an NFA per se, but rather use something like functools.lru_cache to memoize our regex match function? (since, if I understand you correctly, the input parameters are all strings). My main concern was minimizing redundant overhead on repeated GNFA input reads, anyway.

caleb531 avatar Aug 30 '22 17:08 caleb531

@abhinavsinha-adrino Just wanted to check in: do you have any thoughts on my caching mechanism to efficiently manage an underlying NFA within each GNFA? If the solution sounds good to you, is this something you would be willing to implement?

caleb531 avatar Sep 06 '22 21:09 caleb531

@caleb531 Hi, functools.lru_cache sounds good to me for the purpose, however I can't comment on the whole method, need to try it out if it works properly.

Effort wise what I think, the fastest way we can implement this is by using GNFA.to_regex to convert GNFA to a regex, then convert that regex to an NFA (using NFA.from_regex), then use this NFA for reading input. I am not familiar with caching methods, I read about functools.lru_cache, this should work. I suggest this method mainly because of the low effort needed to implement this. This will be a bit slow ofcourse on the first input read (after that we can cache the NFA).

abhinavsinha-adrino avatar Sep 07 '22 04:09 abhinavsinha-adrino

@abhinavsinha-adrino Yeah, functools.lru_cache is just a decorator on top of any function definition, and I don't think we need to define a maxsize (or we can set it to something arbitrarily high, like 128 or 256).

import functools

@functools.lru_cache(maxsize=256)
def foo(x):
    return ...

Anyway, this all sounds great to me. How soon do you think you could get to this? (no rush here)

caleb531 avatar Sep 07 '22 20:09 caleb531

@caleb531 Ah Thanks, caching seems easy part.

For the work, I'm not sure but not anytime soon. I remain very occupied with my semester work. I''ll surely work on this whenever I get time.

abhinavsinha-adrino avatar Sep 07 '22 20:09 abhinavsinha-adrino

@abhinavsinha-adrino That's fine! School should be a priority.

I'll see if I can take a stab at it, and perhaps submit you a PR for you to review.

caleb531 avatar Sep 07 '22 22:09 caleb531

@abhinavsinha-adrino So I was able to implement a GNFA.read_input_stepwise method (branch here), which works by converting the GNFA to an NFA, and calling read_input_stepwise off that NFA. But something doesn't feel right. Allow me to explain:

The ~~elephant~~ underlying NFA in the room

When I call read_input_stepwise() on my GNFA, I receive a generator that yields states that do not belong to the GNFA instance itself (but rather, states from the underlying NFA). This feels strange, because all other library primitives (those that have input-reading capability) yield states that consistently belong to the respective instance, rather than another instance of another type. It's almost like saying that a GNFA should only be able to lay GNFA eggs, not NFA eggs lol.

I think if GNFA were to have a read_input_stepwise() method, it should be implemented in way that does not require an underlying NFA. But since that could require effectively duplicating / re-implementing much of the lambda closure, etc. logic from the NFA class, I am sorely tempted to have read_input_stepwise return a NotImplementedError. Besides, GNFA was never Liskov-substitutable with NFA to begin with, due to the difference of final_states (a set) vs. final_state (a non-set value), among other things.

You said yourself that GNFA is primarily used for converting a plain NFA to a regex. So maybe we just keep the scope of GNFA limited to that purpose, and recommend to developers that they explicitly convert a GNFA to an NFA if they wish to read input. I suppose this makes the GNFA class more of a utility class than a first-class primitive, but if you need the latter, then you should arguably be using an NFA anyway.

Proposal

For GNFA.read_input_stepwise and the other publicly-exposed NFA operation methods (like concatenate), we throw an NotImplementedError with the appropriate message. This approach is also backwards-compatible because those NFA methods don't work on GNFA currently, anyway.

Anyway, that's what I've been thinking about as I've been conceptualizing these GNFA enhancements. If/when you have time, I'd love to hear your thoughts.

caleb531 avatar Sep 26 '22 19:09 caleb531

@caleb531 It's absolutely the fact that GNFAs are only used for conversion to regex so yes I'm of the same opinion that we do not implement input reading methods.

And again, there is no other feasible way to implement such method without using NFA. Let me explain, we cannot jump from one state to another in a GNFA from one input, because the transition takes a regex and each regex corresponds a whole set of states (basically NFA). Either as a whole or piece wise, we cannot implement input reading methods without NFAs.

I very much support your proposal, this keeps things simple but yet complete.

abhinavsinha-adrino avatar Sep 26 '22 19:09 abhinavsinha-adrino

Great, glad to hear you feel the same. With that, I have just merged 4c95310e6929e2f9c63696b6d49dbcfc4cdd3aa8 and dcad68566e961f324b65a7eaecdf30347cb2aba7 into the develop branch, so I think it's fine to close this issue now.

Thanks again for the feedback and for all your contributions to this project, @abhinavsinha-adrino!

caleb531 avatar Sep 26 '22 22:09 caleb531