choco-solver icon indicating copy to clipboard operation
choco-solver copied to clipboard

Very large FiniteAutomaton

Open kweeg opened this issue 8 years ago • 7 comments
trafficstars

Hi, I'm currently upgrading from Choco 2 to Choco 4. I used to manipulate very large FiniteAutomaton (aka DFA in Choco2), with more than 100.000 states. It used to work on Choco2. It doesn't anymore.

Exception in thread "Thread-0" org.chocosolver.solver.exception.SolverException: Unknown value "127436". Note that only integers in [0,65535] are allowed by FiniteAutomaton.

Should I come back to Choco 2 ?

Thanks for your help.

kweeg avatar Sep 12 '17 16:09 kweeg

Hi,

I think this comes from the fact that, in Choco4, DFA can handle integer greater than 9 as transition value if there are surrounded by '<' and '>'.

FiniteAutomaton auto = new FiniteAutomaton("(0|<10>|<20>)*(0|<10>)");

But if I'm not wrong, that was already the case in choco2. Could you provide us a minimal working example that reproduces the bug ?

cprudhom avatar Sep 13 '17 07:09 cprudhom

Sure.

TestFA.java.zip

FiniteAutomaton auto = new FiniteAutomaton("(0|<10>|<20>)*(0|<127436>)");

I think it's only a matter of naming/declaring the states. If you name a state with a int > 65535, it bugs.

In Choco2, I didn't have to name the states when I designed the DFA. dfa = new DFA(t, fs, l); Just a list of transitions (t), a list of final states (fs) and the length of path (l).

Perhaps should I try to do the same ?

kweeg avatar Sep 13 '17 07:09 kweeg

I just tested : in Choco4, I can't declare transitions without having declared the states before. Otherwise, I get a java.lang.IndexOutOfBoundsException

kweeg avatar Sep 13 '17 08:09 kweeg

Indeed, since choco4 encodes a DFA with the help of character not integer, the range should be included in [0,65535]. That's because the current version of FiniteAutomaton (which appears in the last versions of choco2) relies on dk.brics.automaton.RegExp which can only deal with character.

About declaring transitions before state, that's because each state has a unique ID which should be declare before a transition.

Which version of choco2 do you use ?

cprudhom avatar Sep 13 '17 08:09 cprudhom

It was Choco-solver-2.1.5.

kweeg avatar Sep 13 '17 08:09 kweeg

OK, so the major difference comes from the dependence to dk.brics.automaton since choco 3. There is nothing I can do.

Can you consider a mapping to smaller values or you are using all values from 0 to 127436 ?

cprudhom avatar Sep 13 '17 09:09 cprudhom

I'm using DFA to encode a large lexicon (actually the French dictionary). I'll try to make it more compact. Thanks for your feedback.

kweeg avatar Sep 13 '17 09:09 kweeg