choco-solver Regular constraint improvments

trafficstars

Alexandre Papadopoulos suggests a few modifications to PropRegular in order to improve its overall behavior.

When the automaton is based on large alphabet, the graph initialization could be improved by switching the two for-loops, l.217-224
Sometimes, one would like to define a large alphabet, but the FiniteAutomaton forbids to use some specific characters (which are '{', '}', '(', ')', '"', '<', '>', '[', ']'). Not only, there is no exception when one of those are declared in the alphabet, but also it may provide wrong solutions.

Jan 21 '14 16:01 cprudhom

I have some other suggestions for improving regular constraints.

The regular constraint should implement makeOpposite since it is commonly known that we can complement a regular expression very efficiently so people might reasonably expect the negation to be just as efficient.
FiniteAutomaton class should provide a way to create the regular expression without the call to "StringUtils.toCharExp". The current behaviour makes the constraint generally unusable for regular expressions for strings that contain digit characters. My preference is a new constructor like this.

public FiniteAutomaton(RegExp regexp) {
    this();
    this.representedBy = regexp.toAutomaton();
    for (State s : representedBy.getStates()) {
        for (Transition t : s.getTransitions()) {
            for (char c = t.getMin(); c <= t.getMax(); c++) {
                alphabet.add(getIntFromChar(c));
            }
        }
    }
    syncStates();
}

Jan 21 '14 18:01 JLiangWaterloo

@JLiangWaterloo I'm not sure getting the complement is enough to define the opposite of the constraint. Indeed, FiniteAutomaton provides a method (complement()) which returns the complement of the the current FiniteAutomaton based on the same alphabet. But what if the domain of one variable is larger than the alphabet? I guess the complement will miss some solutions.

FiniteAutomaton auto = new FiniteAutomaton("0(1|2)3");
FiniteAutomaton cauto = new FiniteAutomaton(auto.complement());
//...
IntVar[] vars = VF.enumeratedArray("X", 3, 0, 4, solver);
solver.post(IntConstraintFactory.regular(vars, cauto));

will accept solutions like 000 but won't accept solutions like 4444, though.

Jan 22 '14 08:01 cprudhom

Obviously, an easy way to patch my example is to add the letter 4 into the alphabet before getting the complement. But I'm not sure this should be done in the makeOpposite.

Jan 22 '14 09:01 cprudhom

Okay now I see, I didn't realize there is an internal alphabet.

Jan 22 '14 14:01 JLiangWaterloo

@JLiangWaterloo Could you give me a bunch of code FiniteAutomaton which contains digits and is not easy to declare with available constructors?

Jan 27 '14 15:01 cprudhom

Because, for instance, if you want to declare a regular expression where the first variable must take the value 11 and the second one the value 2, something like

FiniteAutomaton auto = new FiniteAutomaton("{11}2");

should be ok.

May be the main problem is that delimiters are not clearly defined in the javadoc...

Jan 27 '14 15:01 cprudhom

@alxpgr Could you tell me if the following patch helps you to deal with specific characters? In FiniteAutomaton, l. 54 & 55:

protected static TIntIntHashMap charFromIntMap = new TIntIntHashMap(10, 0.5f, -1, -1);
protected static TIntIntHashMap intFromCharMap = new TIntIntHashMap(10, 0.5f, -1, -1);

Jan 27 '14 15:01 cprudhom

Forget what I wrote earlier, I think what I said is probably misleading. This is what I would like to do.

I've been using Choco to solve feature models, and now I would like to add string datatypes to the models. Another student has implemented string support in his Z3-SMT implementation and it looks like something we would like to embrace for future models.

One functionality I would like is to support regular expression constraints in the models. For example "String x in regex(abc123(4*))". After I solve, I can "x" and it would print something like "abc12344444".

As of right now, the regular constraint does some processing of the regular expression that makes this infeasible. Here's a minimum working example.

public static void main(String[] args) {
    Solver solver = new Solver();
    IntVar[] ivs = VF.enumeratedArray("iv", 10, 0, 1000, solver);
    solver.post(ICF.regular(ivs, new FiniteAutomaton("abc123(4*)")));

    if (solver.findSolution()) {
        char[] string = new char[ivs.length];
        int i;
        for (i = 0; i < ivs.length; i++) {
            if (ivs[i].getValue() == 0) {
                break;
            }
            string[i] = (char) ivs[i].getValue();
        }
        System.out.println("Found string: " + Arrays.toString(ivs));
        System.out.println("Found string: " + new String(string, 0, i));
    }
}

This will print:

Found string: [iv[0] = 90, iv[1] = 91, iv[2] = 92, iv[3] = 1, iv[4] = 2, iv[5] = 3, iv[6] = 4, iv[7] = 4, iv[8] = 4, iv[9] = 4]
Found string: Z[\

What I would like it print:

Found string: [iv[0] = 97, iv[1] = 98, iv[2] = 99, iv[3] = 61, iv[4] = 62, iv[5] = 63, iv[6] = 64, iv[7] = 64, iv[8] = 64, iv[9] = 64]
Found string: abc1234444

Jan 27 '14 16:01 JLiangWaterloo

Well, the main problem comes from the mapping between a character and the integer used to represent it in the internal data structure of FiniteAutomaton? Some characters are skipped (such as < or {) and it seems that it corrupts the data structure. I'll figure this out.

Jan 28 '14 13:01 cprudhom

@JLiangWaterloo Well, the main reason the problem you describe gives wrong solutions is that you define an automaton based on integers and characters. Choco3 does not manage character variables, though. The javadoc is certainly not clear enough on that point: one can only define regexp using the following alphabet: 0123456789<>. Some other characters are available, required by the library choco depends on dk.brics.automaton, such as: &?*+{},[].#@".

We are going to update the documentation soon.

Feb 05 '14 10:02 cprudhom

This seems like an unecessary restriction. The IAutomaton is just transitioning on integers and there's a natural way of interpreting characters as integers. If I need such behaviour, should I define my own IAutomaton class?

Feb 10 '14 17:02 JLiangWaterloo

choco-solver choco-solver copied to clipboard

Regular constraint improvments

choco-solver
choco-solver copied to clipboard