choco-solver
choco-solver copied to clipboard
Very large FiniteAutomaton
Hi, I'm currently upgrading from Choco 2 to Choco 4. I used to manipulate very large FiniteAutomaton (aka DFA in Choco2), with more than 100.000 states. It used to work on Choco2. It doesn't anymore.
Exception in thread "Thread-0" org.chocosolver.solver.exception.SolverException: Unknown value "127436". Note that only integers in [0,65535] are allowed by FiniteAutomaton.
Should I come back to Choco 2 ?
Thanks for your help.
Hi,
I think this comes from the fact that, in Choco4, DFA can handle integer greater than 9 as transition value if there are surrounded by '<' and '>'.
FiniteAutomaton auto = new FiniteAutomaton("(0|<10>|<20>)*(0|<10>)");
But if I'm not wrong, that was already the case in choco2. Could you provide us a minimal working example that reproduces the bug ?
Sure.
FiniteAutomaton auto = new FiniteAutomaton("(0|<10>|<20>)*(0|<127436>)");
I think it's only a matter of naming/declaring the states. If you name a state with a int > 65535, it bugs.
In Choco2, I didn't have to name the states when I designed the DFA.
dfa = new DFA(t, fs, l);
Just a list of transitions (t), a list of final states (fs) and the length of path (l).
Perhaps should I try to do the same ?
I just tested : in Choco4, I can't declare transitions without having declared the states before. Otherwise, I get a java.lang.IndexOutOfBoundsException
Indeed, since choco4 encodes a DFA with the help of character not integer, the range should be included in [0,65535].
That's because the current version of FiniteAutomaton (which appears in the last versions of choco2) relies on dk.brics.automaton.RegExp which can only deal with character.
About declaring transitions before state, that's because each state has a unique ID which should be declare before a transition.
Which version of choco2 do you use ?
It was Choco-solver-2.1.5.
OK, so the major difference comes from the dependence to dk.brics.automaton since choco 3.
There is nothing I can do.
Can you consider a mapping to smaller values or you are using all values from 0 to 127436 ?
I'm using DFA to encode a large lexicon (actually the French dictionary). I'll try to make it more compact. Thanks for your feedback.