choco-solver
choco-solver copied to clipboard
Regular constraint improvments
Alexandre Papadopoulos suggests a few modifications to PropRegular in order to improve its overall behavior.
- When the automaton is based on large alphabet, the graph initialization could be improved by switching the two for-loops, l.217-224
- Sometimes, one would like to define a large alphabet, but the FiniteAutomaton forbids to use some specific characters (which are '{', '}', '(', ')', '"', '<', '>', '[', ']'). Not only, there is no exception when one of those are declared in the alphabet, but also it may provide wrong solutions.
I have some other suggestions for improving regular constraints.
- The regular constraint should implement makeOpposite since it is commonly known that we can complement a regular expression very efficiently so people might reasonably expect the negation to be just as efficient.
- FiniteAutomaton class should provide a way to create the regular expression without the call to "StringUtils.toCharExp". The current behaviour makes the constraint generally unusable for regular expressions for strings that contain digit characters. My preference is a new constructor like this.
public FiniteAutomaton(RegExp regexp) {
this();
this.representedBy = regexp.toAutomaton();
for (State s : representedBy.getStates()) {
for (Transition t : s.getTransitions()) {
for (char c = t.getMin(); c <= t.getMax(); c++) {
alphabet.add(getIntFromChar(c));
}
}
}
syncStates();
}
@JLiangWaterloo I'm not sure getting the complement is enough to define the opposite of the constraint.
Indeed, FiniteAutomaton provides a method (complement()) which returns the complement of the the current FiniteAutomaton based on the same alphabet. But what if the domain of one variable is larger than the alphabet?
I guess the complement will miss some solutions.
FiniteAutomaton auto = new FiniteAutomaton("0(1|2)3");
FiniteAutomaton cauto = new FiniteAutomaton(auto.complement());
//...
IntVar[] vars = VF.enumeratedArray("X", 3, 0, 4, solver);
solver.post(IntConstraintFactory.regular(vars, cauto));
will accept solutions like 000 but won't accept solutions like 4444, though.
Obviously, an easy way to patch my example is to add the letter 4 into the alphabet before getting the complement.
But I'm not sure this should be done in the makeOpposite.
Okay now I see, I didn't realize there is an internal alphabet.
@JLiangWaterloo Could you give me a bunch of code FiniteAutomaton which contains digits and is not easy to declare with available constructors?
Because, for instance, if you want to declare a regular expression where the first variable must take the value 11 and the second one the value 2, something like
FiniteAutomaton auto = new FiniteAutomaton("{11}2");
should be ok.
May be the main problem is that delimiters are not clearly defined in the javadoc...
@alxpgr Could you tell me if the following patch helps you to deal with specific characters?
In FiniteAutomaton, l. 54 & 55:
protected static TIntIntHashMap charFromIntMap = new TIntIntHashMap(10, 0.5f, -1, -1);
protected static TIntIntHashMap intFromCharMap = new TIntIntHashMap(10, 0.5f, -1, -1);
Forget what I wrote earlier, I think what I said is probably misleading. This is what I would like to do.
I've been using Choco to solve feature models, and now I would like to add string datatypes to the models. Another student has implemented string support in his Z3-SMT implementation and it looks like something we would like to embrace for future models.
One functionality I would like is to support regular expression constraints in the models. For example "String x in regex(abc123(4*))". After I solve, I can "x" and it would print something like "abc12344444".
As of right now, the regular constraint does some processing of the regular expression that makes this infeasible. Here's a minimum working example.
public static void main(String[] args) {
Solver solver = new Solver();
IntVar[] ivs = VF.enumeratedArray("iv", 10, 0, 1000, solver);
solver.post(ICF.regular(ivs, new FiniteAutomaton("abc123(4*)")));
if (solver.findSolution()) {
char[] string = new char[ivs.length];
int i;
for (i = 0; i < ivs.length; i++) {
if (ivs[i].getValue() == 0) {
break;
}
string[i] = (char) ivs[i].getValue();
}
System.out.println("Found string: " + Arrays.toString(ivs));
System.out.println("Found string: " + new String(string, 0, i));
}
}
This will print:
Found string: [iv[0] = 90, iv[1] = 91, iv[2] = 92, iv[3] = 1, iv[4] = 2, iv[5] = 3, iv[6] = 4, iv[7] = 4, iv[8] = 4, iv[9] = 4]
Found string: Z[\
What I would like it print:
Found string: [iv[0] = 97, iv[1] = 98, iv[2] = 99, iv[3] = 61, iv[4] = 62, iv[5] = 63, iv[6] = 64, iv[7] = 64, iv[8] = 64, iv[9] = 64]
Found string: abc1234444
Well, the main problem comes from the mapping between a character and the integer used to represent it in the internal data structure of FiniteAutomaton?
Some characters are skipped (such as < or {) and it seems that it corrupts the data structure.
I'll figure this out.
@JLiangWaterloo Well, the main reason the problem you describe gives wrong solutions is that you define an automaton based on integers and characters.
Choco3 does not manage character variables, though.
The javadoc is certainly not clear enough on that point: one can only define regexp using the following alphabet: 0123456789<>.
Some other characters are available, required by the library choco depends on dk.brics.automaton, such as: &?*+{},[].#@".
We are going to update the documentation soon.
This seems like an unecessary restriction. The IAutomaton is just transitioning on integers and there's a natural way of interpreting characters as integers. If I need such behaviour, should I define my own IAutomaton class?