Delimiter not taken into account in multi-character tokens
Hi I can't get dafsa working with multi-character tokens.
with a simple test list defined as:
test = ["a b c", "a ab ac", "a ab ab c"]
dseq = DAFSA(test, delimiter=" ")
The expected behavior would be to have spaces processed as delimiters but they are considered as tokens:
print(dseq)
DAFSA with 10 nodes and 11 edges (3 inserted sequences)
+-- #0: 0(#1/3:/3) [('a', 1)] +-- #1: n(#2/3:< >/3) [(' ', 2)] +-- #2: n(#3/3:/2|#7/3:/1) [('a', 3), ('b', 7)] +-- #3: n(#4/2:/2) [('b', 4)] +-- #4: n(#5/2:< >/2) [(' ', 5)] +-- #5: n(#6/2:/2) [('a', 6)] +-- #6: n(#7/2:/1|#9/2:
/1) [('b', 7), ('c', 9)] +-- #7: n(#8/2:< >/2) [(' ', 8)] +-- #8: n(#9/2: /2) [('c', 9)] +-- #9: F() []
Same issue with spaces changed to underscores and delimiter="_" added. I probably did something stupidly wrong... My system is Windows 11 with Python 3.9.16 et dafsa 1.0 installed. Thanks!