k2 icon indicating copy to clipboard operation
k2 copied to clipboard

Two ways of doing composition give slightly different resulting FSAs

Open freewym opened this issue 4 years ago • 26 comments

I found my previous way of emulating composition yields a slightly different result than the one from using the recently introduced k2.compose(), and it looks like the result from ``k2.compose()` is correct. But I am unable to figure out what:

I have H_inv and L, where H_inv.aux_labels is a ragged tensor and L.aux_labels is a normal tensor. I want to get HL_inv

one way:

if hasattr(L, "aux_labels"):
    L.temp_labels = L.aux_labels
    del L.aux_labels
HL = k2.invert(k2.connect(k2.intersect(L, H_inv)))
if hasattr(HL, "temp_labels"):
    HL.aux_labels = HL.temp_labels
    del HL.temp_labels
HL_inv = k2.arc_sort(k2.invert(HL))

the other way:

HL_inv = k2.arc_sort(k2.connect(k2.compose(L.invert(), H_inv)))

The two resulting fsas have the same topology, same scores, and same aux_labels. The only difference is labels on a few arcs: some labels are 0 in the second fsa but not in the first one. I guess maybe the way how L.aux_labels or HL.aux_labels was handled in the first one is incorrect, but I don't know why as I am not familiar with the internal code.

freewym avatar Dec 25 '20 09:12 freewym

I want to get HL_inv

k2.compose(L.invert(), H_inv) are you computing L_inv H_inv, not H L_inv?

csukuangfj avatar Dec 25 '20 10:12 csukuangfj

I see. You are computing (HL)_inv

csukuangfj avatar Dec 25 '20 10:12 csukuangfj

Not sure if I understand your problems correctly, but k2.invert will remove those epsilon labels when output them as dest.aux_labels, as here you invert two times (in the first version), so the epsilons labels will be removed (as they were removed in the first invert operation)

https://github.com/k2-fsa/k2/blob/2141ace06598dc0cbe0f26eeb8e850c95d495209/k2/csrc/fsa_algo.h#L490-L492

BTW, seems I forgot to add the documentation to GPU version and python, will do in the next PR together.

qindazhu avatar Dec 25 '20 11:12 qindazhu

Noted as well that L.invert() is just swapping the labels and aux_labels (as they are both tensor), so it did not remove anything.

qindazhu avatar Dec 25 '20 11:12 qindazhu

Mm, more details would be nice.

On Friday, December 25, 2020, Yiming Wang [email protected] wrote:

I found my previous way of emulating composition yields a slightly different result than the one from using the recently introduced k2.compose(), and it looks like the result from ``k2.compose()` is correct. But I am unable to figure out what:

I have H_inv and L, where H_inv.aux_labels is a ragged tensor and L.aux_labels is a normal tensor. I want to get HL_inv

one way:

if hasattr(L, "aux_labels"): L.temp_labels = L.aux_labels del L.aux_labelsHL = k2.invert(k2.connect(k2.intersect(L, H_inv)))if hasattr(HL, "temp_labels"): HL.aux_labels = HL.temp_labels del HL.temp_labelsHL_inv = k2.arc_sort(k2.invert(HL))

the other way:

HL_inv = k2.arc_sort(k2.connect(k2.compose(L.invert(), H_inv)))

The two resulting fsas have the same topology, same scores, and same aux_labels. The only difference is labels on a few arcs: some labels are 0 in the second fsa but not in the first one. I guess maybe the way how L.aux_labels or HL.aux_labels was handled in the first one is incorrect, but I don't know why as I am not familiar with the internal code.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/k2-fsa/k2/issues/550, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOZUB7EA4KQ3TDUPP43SWRNNJANCNFSM4VI75OPA .

danpovey avatar Dec 25 '20 12:12 danpovey

I compared the 2 fsas with k2.is_rand_equivalent() and it is False.

H_inv is obtained by

hmm_vec = k2.create_fsa_vec(hmms)
hmm_vec.labels = torch.where(hmm_vec.labels >= 0, hmm_vec.labels + 1, hmm_vec.labels)  # temporarily +1
H = k2.connect(k2.remove_epsilons_iterative_tropical(k2.closure(k2.union(hmm_vec))))
H.labels = torch.where(H.labels > 0, H.labels - 1, H.labels)  # restore labels
H_inv = k2.arc_sort(k2.invert(H))

where hmms is a list of hmm topos. L is created from a simple lexicon

FREETEXT freetext
HiXiaowen hixiaowen
NihaoWenwen nihaowenwen
<sil> SIL

with optional silence added.

The 2 fsas both have ~220 arcs and 120 states, and the diff is:

74,78c74,78
< 28 47 1 0
< 29 49 1 0
< 30 51 1 0
< 31 52 1 0
< 32 53 1 0
---
> 28 47 0 0
> 29 49 0 0
> 30 51 0 0
> 31 52 0 0
> 32 53 0 0
125,129c125,129
< 54 55 1 -0.693147
< 54 57 1 -0.693147
< 54 59 1 -0.693147
< 54 60 1 -0.693147
< 54 61 1 -0.693147
---
> 54 55 0 -0.693147
> 54 57 0 -0.693147
> 54 59 0 -0.693147
> 54 60 0 -0.693147
> 54 61 0 -0.693147

where only labels on those acs are different.

freewym avatar Dec 25 '20 22:12 freewym

Not sure if I understand your problems correctly, but k2.invert will remove those epsilon labels when output them as dest.aux_labels, as here you invert two times (in the first version), so the epsilons labels will be removed (as they were removed in the first invert operation)

https://github.com/k2-fsa/k2/blob/2141ace06598dc0cbe0f26eeb8e850c95d495209/k2/csrc/fsa_algo.h#L490-L492

BTW, seems I forgot to add the documentation to GPU version and python, will do in the next PR together.

The diff is on word ids (correspond to L.aux_labels), which I think is not involved in the first k2.invert() operation of the first version

freewym avatar Dec 25 '20 22:12 freewym

Oh, I see. So in the first version

if hasattr(L, "aux_labels"):
    L.temp_labels = L.aux_labels
    del L.aux_labels
HL = k2.invert(k2.connect(k2.intersect(L, H_inv)))
if hasattr(HL, "temp_labels"):
    HL.aux_labels = HL.temp_labels
    del HL.temp_labels
HL_inv = k2.arc_sort(k2.invert(HL))

seems you suppose the order of word-ids in HL.temp_labels is same as in L.aux_labels? here you did HL = k2.invert(k2.connect(k2.intersect(L, H_inv))), I'm not sure word-ids are still in the same order (technically speaking they may be even not the same set for common usage of intersect and connect.). You may need to combind arc_map info from intersect(for L), connect and invert and use index_attr to index those correct HL.aux_labels.

qindazhu avatar Dec 26 '20 02:12 qindazhu

As your L here is a very simple format

FREETEXT freetext
HiXiaowen hixiaowen
NihaoWenwen nihaowenwen
<sil> SIL

I think maybe there is an easier way to do this is: using the symbol Tables for the L.labels and L.aux_labels, and map the id (phone-id -> word-id) in the result HL to get HL.aux_labels.

qindazhu avatar Dec 26 '20 02:12 qindazhu

I was assuming, during intersect and connect, all named tensor attributes (e.g. temp_labels, aux_labels) in a src Fsa will be mapped correctly the dest Fsa, so that HL.temp_labels has changed from L.aux_labels.

freewym avatar Dec 26 '20 02:12 freewym

As your L here is a very simple format

FREETEXT freetext
HiXiaowen hixiaowen
NihaoWenwen nihaowenwen
<sil> SIL

I think maybe there is an easier way to do this is: using the symbol Tables for the L.labels and L.aux_labels, and map the id (phone-id -> word-id) in the result HL to get HL.aux_labels.

I can just use compose as it is doing things correctly. I am just curious what's wrong in the first version

freewym avatar Dec 26 '20 02:12 freewym

mm, see my reply here https://github.com/k2-fsa/k2/issues/550#issuecomment-751310481

qindazhu avatar Dec 26 '20 02:12 qindazhu

mm, see my reply here #550 (comment)

Shouldn't temp_labels be changed according to arc_map during intersect and connect, just like aux_labels? I mean, I suppose https://github.com/k2-fsa/k2/blob/master/k2/python/k2/fsa_algo.py#L125 is already handling this

freewym avatar Dec 26 '20 02:12 freewym

yeah, but currently invert did not do this (I have not added this, sorry). So can you try to do this to see if it's correct

HL =k2.connect(k2.intersect(L, H_inv))
if hasattr(HL, "temp_labels"):
    HL.labels = HL.temp_labels
    del HL.temp_labels
HL = k2.invert(HL)
HL_inv = k2.arc_sort(k2.invert(HL))

But definitely we should handle those attributes in invert, will do later.

qindazhu avatar Dec 26 '20 03:12 qindazhu

HL = k2.invert(HL)

I see. Yes, it works

freewym avatar Dec 26 '20 03:12 freewym

BTW, I realized remove_epsilons_iterative_tropical is in tropical semiring, so after applying this to H for example, sumexp(scores) for all outgoing scores from a specific state will no longer be 1.0, and also relative transition distribution over labels may be changed, which may have some potential problem in some cases (?)

freewym avatar Dec 26 '20 03:12 freewym

Yiming: I doubt we rely on the stochasticity. In particular I'm pretty sure we don't rely in an important way on any difference that could be resolved by pushing. If it's something that changes the total weight of the FSA, that may affect things a bit but likely won't materially affect the results other than diagnostics.

danpovey avatar Dec 26 '20 04:12 danpovey

Haowen: I haven't followed the details here, but if invert can't handle other attributes it should crash.

Actually I am not convinced that invert() is the best interface, or at least, I think maybe there should be an underlying operation that it calls that is also accessible to users. It seems to me that the essential quality of what we want, is to expand the named attributes from ragged tensors to flat tensors, and inversion (i.e. swapping labels and aux_labels) could be done afterward as a separate operation; this is more in the spirit of not making aux_labels too special.

Notice that Expand() only takes the shape. If there are multiple named attributes with ragged shape, it should be straightforward to find a ragged shape that "covers" them, i.e. the num-elements for each arc is the max of the individual ones. E.g.

/* Given a list of shapes with 2 axes and the same Dim0(), return the smallest shape that 'covers' all of them, i.e. size the i'th sub-list of the answer is the maximum of the sizes of the i'th sub-list of srcs @param [in] num_srcs Number of source shapes; must have 2 axes and Dim0() all equal @param [in] srcs Array of input shapes; inputs are *(srcs[0]), *(srcs[1]) ... @return Returns shape with the same Dim0() as all the srcs and sub-list sizes equal to the maximum of those of the sources. */ RaggedShape CoveringShape(int32_t num_srcs, RaggedShape **srcs);

and we might need a function that tells us how the values of a ragged tensor align with those of a covering shape, e.g.

/* Returns an arc_map that says, for each element of covering, the corresponding element of src, or -1 if there was no such element @param [in] src Shape that was likely an input to CoveringShape... must have 2 axes @param [in] covering Shape with 2 axes, the covering.Dim0()==src.Dim0(), and sub-list sizes not less than the corresponding sub-list sizes of src @return Returns an array with Dim() == covering.NumElements(), containing, for each element of covering, either the corresponding element of src or -1 if this was not applicable. E.g. if src == [ [ x x ] [ x ] ] and covering == [ [ x x x ] [x] ], would return [ 0 1 -1 2 ]. */ Array1<int32_t> CoveringShapeForwardMap(RaggedShape &src, RaggedShape &covering);

So you'd do, in python, expand(fsa), and it would treat all attributes the same. Note: the final -1's might not end up in the right place, but this would be easy to fix by post-processing the linearized elements, where needed.

On Sat, Dec 26, 2020 at 11:40 AM Yiming Wang [email protected] wrote:

BTW, I realized remove_epsilons_iterative_tropical is in tropical semiring, so after applying this to H for example, sumexp(scores) for all outgoing scores from a specific state will no longer be 1.0, which may have some potential problem in some cases (?)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/k2-fsa/k2/issues/550#issuecomment-751314636, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOYP36L2VJZV7XZDZEDSWVLL7ANCNFSM4VI75OPA .

danpovey avatar Dec 26 '20 04:12 danpovey

Yiming: I doubt we rely on the stochasticity. In particular I'm pretty sure we don't rely in an important way on any difference that could be resolved by pushing. If it's something that changes the total weight of the FSA, that may affect things a bit but likely won't materially affect the results other than diagnostics.

OK. just for double-check:

I have an Fsa H, which represents the closure of union of "1*2" and "3*45*67*89*10" on input: hmm

After applying k2.remove_epsilons_iterative_tropical(), it returns: H

where, for example, from state 6, the ratio of total scores between outgoing arcs with label 9 and outgoing arcs with label 10 changed. I think the two fst are not equivalent even after making the 2nd one stochastic. Do you mean it will not noticeably affect the results?

freewym avatar Dec 26 '20 05:12 freewym

Actually state 7 in your bottom is a "dead state", it cannot affect the output because no arcs leave it. We could possibly add "connect" as an optional argument to the remove-epsilons code, since users will often want to connect afterward. The total weight would only be affected by remove-epsilons if there were multiple ways of getting via epsilons from some state to some other state.

On Sat, Dec 26, 2020 at 1:35 PM Yiming Wang [email protected] wrote:

Yiming: I doubt we rely on the stochasticity. In particular I'm pretty sure we don't rely in an important way on any difference that could be resolved by pushing. If it's something that changes the total weight of the FSA, that may affect things a bit but likely won't materially affect the results other than diagnostics.

OK. just for double-check:

I have an Fsa H, which represents the closure of union of "[1*]2" and "[3*]4[5*]6[7*]8[9*]10" on input: [image: hmm] https://user-images.githubusercontent.com/3506322/103113086-fea72300-4626-11eb-852f-4fc4ea6b40c9.png

After applying k2.remove_epsilons_iterative_tropical(), it returns: [image: H] https://user-images.githubusercontent.com/3506322/103113090-049d0400-4627-11eb-95c2-8a3391e1b58e.png

where, for example, from state 6, the ratio of total scores between outgoing arcs with label 9 and outgoing arcs with label 10 changed. I think the two fst are not equivalent even after making the 2nd one stochastic. Do you mean it will not noticeably affect the results?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/k2-fsa/k2/issues/550#issuecomment-751320490, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO23QKT7V65YIY5CF43SWVYZHANCNFSM4VI75OPA .

danpovey avatar Dec 26 '20 06:12 danpovey

Haowen: I haven't followed the details here, but if invert can't handle other attributes it should crash. Actually I am not convinced that invert() is the best interface, or at least, I think maybe there should be an underlying operation that it calls that is also accessible to users. It seems to me that the essential quality of what we want, is to expand the named attributes from ragged tensors to flat tensors, and inversion (i.e. swapping labels and aux_labels) could be done afterward as a separate operation; this is more in the spirit of not making aux_labels too special. Notice that Expand() only takes the shape. If there are multiple named attributes with ragged shape, it should be straightforward to find a ragged shape that "covers" them, i.e. the num-elements for each arc is the max of the individual ones. E.g. /* Given a list of shapes with 2 axes and the same Dim0(), return the smallest shape that 'covers' all of them, i.e. size the i'th sub-list of the answer is the maximum of the sizes of the i'th sub-list of srcs @param [in] num_srcs Number of source shapes; must have 2 axes and Dim0() all equal @param [in] srcs Array of input shapes; inputs are *(srcs[0]), *(srcs[1]) ... @return Returns shape with the same Dim0() as all the srcs and sub-list sizes equal to the maximum of those of the sources. / RaggedShape CoveringShape(int32_t num_srcs, RaggedShape srcs); and we might need a function that tells us how the values of a ragged tensor align with those of a covering shape, e.g. / Returns an arc_map that says, for each element of covering, the corresponding element of src, or -1 if there was no such element @param [in] src Shape that was likely an input to CoveringShape... must have 2 axes @param [in] covering Shape with 2 axes, the covering.Dim0()==src.Dim0(), and sub-list sizes not less than the corresponding sub-list sizes of src @return Returns an array with Dim() == covering.NumElements(), containing, for each element of covering, either the corresponding element of src or -1 if this was not applicable. E.g. if src == [ [ x x ] [ x ] ] and covering == [ [ x x x ] [x] ], would return [ 0 1 -1 2 ]. / Array1<int32_t> CoveringShapeForwardMap(RaggedShape &src, RaggedShape &covering); So you'd do, in python, expand(fsa), and it would treat all attributes the same. Note: the final -1's might not end up in the right place, but this would be easy to fix by post-processing the linearized elements, where needed. On Sat, Dec 26, 2020 at 11:40 AM Yiming Wang @.> wrote: BTW, I realized remove_epsilons_iterative_tropical is in tropical semiring, so after applying this to H for example, sumexp(scores) for all outgoing scores from a specific state will no longer be 1.0, which may have some potential problem in some cases (?) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#550 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOYP36L2VJZV7XZDZEDSWVLL7ANCNFSM4VI75OPA .

OK, will do this next week (may come back to you if there's any question), thanks!

qindazhu avatar Dec 26 '20 07:12 qindazhu

@danpovey in LF-MMI, I did

num_graph.scores = num_graph.scores.new_zeros(num_graph.scores.size())
num_graph = k2.connect(k2.intersect(num_graph, den_graph, treat_epsilons_specially=False))

where I make sure num_graph and den_graph both have no label 0, but still tot_scores of num_graph > tot_scores of den_graph when intersected with the same dense_fsa_vec. Do you have any clue on this?

freewym avatar Dec 27 '20 03:12 freewym

Perhaps the den graph was determinized and the num graph was not? It could happen if the num graph had different paths with the same symbol sequence.

On Sun, Dec 27, 2020 at 11:38 AM Yiming Wang [email protected] wrote:

@danpovey https://github.com/danpovey in LF-MMI, I did

num_graph.scores = num_graph.scores.new_zeros(num_graph.scores.size())num_graph = k2.connect(k2.intersect(num_graph, den_graph, treat_epsilons_specially=False))

where I make sure num_graph and den_graph both have no label 0, but still tot_scores of num_graph > tot_scores of den_graph when intersected with the same dense_fsa_vec. Do you have any clue on this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/k2-fsa/k2/issues/550#issuecomment-751421933, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO5IKTZ6462GW2VBSLLSW2T4DANCNFSM4VI75OPA .

danpovey avatar Dec 27 '20 03:12 danpovey

Hi

sorry if I post this question here, but I don't see other way of asking questions about k2, not yet anyway anyway, this will be 1-minute answer for all of you, I guess basically I don't get the fst composition in k2; if we have L and G, we intersect Linv with G, according to the intersection algorithm explained here in a toy example https://github.com/k2-fsa/k2/blob/master/docs/source/python_tutorials/fsa_algo/fsa_algo.rst#id37 well, I don't get it L has token ids on the input and words on the output and G is a word LM: how do I match the arcs of Linv with G if the output labels are now token ids? the explicative example is not helping me because it's an acceptor

armusc avatar Dec 16 '21 15:12 armusc

how do I match the arcs of Linv with G if the output labels are now token ids?

Well, there is only one label on an arc. All other labels are just attributes of an arc, existing only in Python.

If you have some knowledge about OpenFST, then please don't get confused about ilabel and olabel.

The following shows the definition of Arc in k2: https://github.com/k2-fsa/k2/blob/aab2dd77d1fa8e31b153c9ff95e10295811a8b12/k2/csrc/fsa.h#L31-L35

You see that an Arc has only one label. When doing intersection, we only match the label, as you can see there are no places to put other labels in C++.

When intersecting Linv with G, it uses labels of Linv and G, which are both word IDs. The aux_labels, i.e., token IDs, from L_inv is propagated to the resulting FSA, i.e., the intersection result.

csukuangfj avatar Dec 16 '21 15:12 csukuangfj

thank you!

armusc avatar Dec 16 '21 16:12 armusc