dafsa icon indicating copy to clipboard operation
dafsa copied to clipboard

Delimiter not taken into account in multi-character tokens

Open FrancoisPellegrino opened this issue 10 months ago • 0 comments

Hi I can't get dafsa working with multi-character tokens.

with a simple test list defined as:

test = ["a b c", "a ab ac", "a ab ab c"]
dseq = DAFSA(test, delimiter=" ")

The expected behavior would be to have spaces processed as delimiters but they are considered as tokens:

print(dseq)

DAFSA with 10 nodes and 11 edges (3 inserted sequences)

+-- #0: 0(#1/3:/3) [('a', 1)] +-- #1: n(#2/3:< >/3) [(' ', 2)] +-- #2: n(#3/3:/2|#7/3:/1) [('a', 3), ('b', 7)] +-- #3: n(#4/2:/2) [('b', 4)] +-- #4: n(#5/2:< >/2) [(' ', 5)] +-- #5: n(#6/2:/2) [('a', 6)] +-- #6: n(#7/2:/1|#9/2:/1) [('b', 7), ('c', 9)] +-- #7: n(#8/2:< >/2) [(' ', 8)] +-- #8: n(#9/2:/2) [('c', 9)] +-- #9: F() []

Same issue with spaces changed to underscores and delimiter="_" added. I probably did something stupidly wrong... My system is Windows 11 with Python 3.9.16 et dafsa 1.0 installed. Thanks!

FrancoisPellegrino avatar Feb 27 '25 11:02 FrancoisPellegrino