k2
k2 copied to clipboard
Two ways of doing composition give slightly different resulting FSAs
I found my previous way of emulating composition yields a slightly different result than the one from using the recently introduced k2.compose()
, and it looks like the result from ``k2.compose()` is correct. But I am unable to figure out what:
I have H_inv and L, where H_inv.aux_labels is a ragged tensor and L.aux_labels is a normal tensor. I want to get HL_inv
one way:
if hasattr(L, "aux_labels"):
L.temp_labels = L.aux_labels
del L.aux_labels
HL = k2.invert(k2.connect(k2.intersect(L, H_inv)))
if hasattr(HL, "temp_labels"):
HL.aux_labels = HL.temp_labels
del HL.temp_labels
HL_inv = k2.arc_sort(k2.invert(HL))
the other way:
HL_inv = k2.arc_sort(k2.connect(k2.compose(L.invert(), H_inv)))
The two resulting fsas have the same topology, same scores, and same aux_labels. The only difference is labels on a few arcs: some labels are 0 in the second fsa but not in the first one. I guess maybe the way how L.aux_labels
or HL.aux_labels
was handled in the first one is incorrect, but I don't know why as I am not familiar with the internal code.
I want to get HL_inv
k2.compose(L.invert(), H_inv)
are you computing L_inv H_inv
, not H L_inv
?
I see. You are computing (HL)_inv
Not sure if I understand your problems correctly, but k2.invert
will remove those epsilon labels when output them as dest.aux_labels
, as here you invert two times (in the first version), so the epsilons labels will be removed (as they were removed in the first invert
operation)
https://github.com/k2-fsa/k2/blob/2141ace06598dc0cbe0f26eeb8e850c95d495209/k2/csrc/fsa_algo.h#L490-L492
BTW, seems I forgot to add the documentation to GPU version and python, will do in the next PR together.
Noted as well that L.invert()
is just swapping the labels and aux_labels (as they are both tensor), so it did not remove anything.
Mm, more details would be nice.
On Friday, December 25, 2020, Yiming Wang [email protected] wrote:
I found my previous way of emulating composition yields a slightly different result than the one from using the recently introduced k2.compose(), and it looks like the result from ``k2.compose()` is correct. But I am unable to figure out what:
I have H_inv and L, where H_inv.aux_labels is a ragged tensor and L.aux_labels is a normal tensor. I want to get HL_inv
one way:
if hasattr(L, "aux_labels"): L.temp_labels = L.aux_labels del L.aux_labelsHL = k2.invert(k2.connect(k2.intersect(L, H_inv)))if hasattr(HL, "temp_labels"): HL.aux_labels = HL.temp_labels del HL.temp_labelsHL_inv = k2.arc_sort(k2.invert(HL))
the other way:
HL_inv = k2.arc_sort(k2.connect(k2.compose(L.invert(), H_inv)))
The two resulting fsas have the same topology, same scores, and same aux_labels. The only difference is labels on a few arcs: some labels are 0 in the second fsa but not in the first one. I guess maybe the way how L.aux_labels or HL.aux_labels was handled in the first one is incorrect, but I don't know why as I am not familiar with the internal code.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/k2-fsa/k2/issues/550, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOZUB7EA4KQ3TDUPP43SWRNNJANCNFSM4VI75OPA .
I compared the 2 fsas with k2.is_rand_equivalent()
and it is False.
H_inv is obtained by
hmm_vec = k2.create_fsa_vec(hmms)
hmm_vec.labels = torch.where(hmm_vec.labels >= 0, hmm_vec.labels + 1, hmm_vec.labels) # temporarily +1
H = k2.connect(k2.remove_epsilons_iterative_tropical(k2.closure(k2.union(hmm_vec))))
H.labels = torch.where(H.labels > 0, H.labels - 1, H.labels) # restore labels
H_inv = k2.arc_sort(k2.invert(H))
where hmms
is a list of hmm topos.
L is created from a simple lexicon
FREETEXT freetext
HiXiaowen hixiaowen
NihaoWenwen nihaowenwen
<sil> SIL
with optional silence added.
The 2 fsas both have ~220 arcs and 120 states, and the diff is:
74,78c74,78
< 28 47 1 0
< 29 49 1 0
< 30 51 1 0
< 31 52 1 0
< 32 53 1 0
---
> 28 47 0 0
> 29 49 0 0
> 30 51 0 0
> 31 52 0 0
> 32 53 0 0
125,129c125,129
< 54 55 1 -0.693147
< 54 57 1 -0.693147
< 54 59 1 -0.693147
< 54 60 1 -0.693147
< 54 61 1 -0.693147
---
> 54 55 0 -0.693147
> 54 57 0 -0.693147
> 54 59 0 -0.693147
> 54 60 0 -0.693147
> 54 61 0 -0.693147
where only labels on those acs are different.
Not sure if I understand your problems correctly, but
k2.invert
will remove those epsilon labels when output them asdest.aux_labels
, as here you invert two times (in the first version), so the epsilons labels will be removed (as they were removed in the firstinvert
operation)https://github.com/k2-fsa/k2/blob/2141ace06598dc0cbe0f26eeb8e850c95d495209/k2/csrc/fsa_algo.h#L490-L492
BTW, seems I forgot to add the documentation to GPU version and python, will do in the next PR together.
The diff is on word ids (correspond to L.aux_labels
), which I think is not involved in the first k2.invert()
operation of the first version
Oh, I see. So in the first version
if hasattr(L, "aux_labels"):
L.temp_labels = L.aux_labels
del L.aux_labels
HL = k2.invert(k2.connect(k2.intersect(L, H_inv)))
if hasattr(HL, "temp_labels"):
HL.aux_labels = HL.temp_labels
del HL.temp_labels
HL_inv = k2.arc_sort(k2.invert(HL))
seems you suppose the order of word-ids in HL.temp_labels
is same as in L.aux_labels
? here you did HL = k2.invert(k2.connect(k2.intersect(L, H_inv)))
, I'm not sure word-ids are still in the same order (technically speaking they may be even not the same set
for common usage of intersect
and connect
.). You may need to combind arc_map info from intersect
(for L), connect
and invert
and use index_attr
to index those correct HL.aux_labels
.
As your L here is a very simple format
FREETEXT freetext
HiXiaowen hixiaowen
NihaoWenwen nihaowenwen
<sil> SIL
I think maybe there is an easier way to do this is: using the symbol Tables for the L.labels and L.aux_labels, and map the id (phone-id -> word-id) in the result HL
to get HL.aux_labels
.
I was assuming, during intersect and connect, all named tensor attributes (e.g. temp_labels, aux_labels) in a src Fsa will be mapped correctly the dest Fsa, so that HL.temp_labels
has changed from L.aux_labels.
As your L here is a very simple format
FREETEXT freetext HiXiaowen hixiaowen NihaoWenwen nihaowenwen <sil> SIL
I think maybe there is an easier way to do this is: using the symbol Tables for the L.labels and L.aux_labels, and map the id (phone-id -> word-id) in the result
HL
to getHL.aux_labels
.
I can just use compose as it is doing things correctly. I am just curious what's wrong in the first version
mm, see my reply here https://github.com/k2-fsa/k2/issues/550#issuecomment-751310481
mm, see my reply here #550 (comment)
Shouldn't temp_labels be changed according to arc_map during intersect and connect, just like aux_labels? I mean, I suppose https://github.com/k2-fsa/k2/blob/master/k2/python/k2/fsa_algo.py#L125 is already handling this
yeah, but currently invert
did not do this (I have not added this, sorry). So can you try to do this to see if it's correct
HL =k2.connect(k2.intersect(L, H_inv))
if hasattr(HL, "temp_labels"):
HL.labels = HL.temp_labels
del HL.temp_labels
HL = k2.invert(HL)
HL_inv = k2.arc_sort(k2.invert(HL))
But definitely we should handle those attributes in invert
, will do later.
HL = k2.invert(HL)
I see. Yes, it works
BTW, I realized remove_epsilons_iterative_tropical
is in tropical semiring, so after applying this to H for example, sumexp(scores) for all outgoing scores from a specific state will no longer be 1.0, and also relative transition distribution over labels may be changed, which may have some potential problem in some cases (?)
Yiming: I doubt we rely on the stochasticity. In particular I'm pretty sure we don't rely in an important way on any difference that could be resolved by pushing. If it's something that changes the total weight of the FSA, that may affect things a bit but likely won't materially affect the results other than diagnostics.
Haowen: I haven't followed the details here, but if invert can't handle other attributes it should crash.
Actually I am not convinced that invert() is the best interface, or at least, I think maybe there should be an underlying operation that it calls that is also accessible to users. It seems to me that the essential quality of what we want, is to expand the named attributes from ragged tensors to flat tensors, and inversion (i.e. swapping labels and aux_labels) could be done afterward as a separate operation; this is more in the spirit of not making aux_labels too special.
Notice that Expand() only takes the shape. If there are multiple named attributes with ragged shape, it should be straightforward to find a ragged shape that "covers" them, i.e. the num-elements for each arc is the max of the individual ones. E.g.
/*
Given a list of shapes with 2 axes and the same Dim0(), return the
smallest shape
that 'covers' all of them, i.e. size the i'th sub-list of the answer is
the maximum of
the sizes of the i'th sub-list of srcs
@param [in] num_srcs Number of source shapes; must have 2 axes and
Dim0()
all equal
@param [in] srcs Array of input shapes; inputs are *(srcs[0])
,
*(srcs[1])
...
@return Returns shape with the same Dim0() as all the srcs
and
sub-list sizes
equal to the maximum of those of the sources.
*/
RaggedShape CoveringShape(int32_t num_srcs, RaggedShape **srcs);
and we might need a function that tells us how the values of a ragged tensor align with those of a covering shape, e.g.
/*
Returns an arc_map that says, for each element of covering
, the
corresponding element
of src
, or -1 if there was no such element
@param [in] src Shape that was likely an input to
CoveringShape
... must have 2
axes
@param [in] covering Shape with 2 axes, the
covering.Dim0()==src.Dim0()
, and
sub-list sizes not less than the
corresponding sub-list sizes of src
@return Returns an array with Dim() == covering.NumElements()
, containing,
for each element of covering
, either
the corresponding element of src
or -1 if this was not applicable. E.g.
if src == [ [ x x ] [ x ] ] and
covering == [ [ x x x ] [x] ], would
return [ 0 1 -1 2 ].
*/
Array1<int32_t> CoveringShapeForwardMap(RaggedShape &src, RaggedShape
&covering);
So you'd do, in python, expand(fsa), and it would treat all attributes the same. Note: the final -1's might not end up in the right place, but this would be easy to fix by post-processing the linearized elements, where needed.
On Sat, Dec 26, 2020 at 11:40 AM Yiming Wang [email protected] wrote:
BTW, I realized remove_epsilons_iterative_tropical is in tropical semiring, so after applying this to H for example, sumexp(scores) for all outgoing scores from a specific state will no longer be 1.0, which may have some potential problem in some cases (?)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/k2-fsa/k2/issues/550#issuecomment-751314636, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOYP36L2VJZV7XZDZEDSWVLL7ANCNFSM4VI75OPA .
Yiming: I doubt we rely on the stochasticity. In particular I'm pretty sure we don't rely in an important way on any difference that could be resolved by pushing. If it's something that changes the total weight of the FSA, that may affect things a bit but likely won't materially affect the results other than diagnostics.
OK. just for double-check:
I have an Fsa H, which represents the closure of union of "1*2" and "3*45*67*89*10" on input:
After applying k2.remove_epsilons_iterative_tropical()
, it returns:
where, for example, from state 6, the ratio of total scores between outgoing arcs with label 9 and outgoing arcs with label 10 changed. I think the two fst are not equivalent even after making the 2nd one stochastic. Do you mean it will not noticeably affect the results?
Actually state 7 in your bottom is a "dead state", it cannot affect the output because no arcs leave it. We could possibly add "connect" as an optional argument to the remove-epsilons code, since users will often want to connect afterward. The total weight would only be affected by remove-epsilons if there were multiple ways of getting via epsilons from some state to some other state.
On Sat, Dec 26, 2020 at 1:35 PM Yiming Wang [email protected] wrote:
Yiming: I doubt we rely on the stochasticity. In particular I'm pretty sure we don't rely in an important way on any difference that could be resolved by pushing. If it's something that changes the total weight of the FSA, that may affect things a bit but likely won't materially affect the results other than diagnostics.
OK. just for double-check:
I have an Fsa H, which represents the closure of union of "[1*]2" and "[3*]4[5*]6[7*]8[9*]10" on input: [image: hmm] https://user-images.githubusercontent.com/3506322/103113086-fea72300-4626-11eb-852f-4fc4ea6b40c9.png
After applying k2.remove_epsilons_iterative_tropical(), it returns: [image: H] https://user-images.githubusercontent.com/3506322/103113090-049d0400-4627-11eb-95c2-8a3391e1b58e.png
where, for example, from state 6, the ratio of total scores between outgoing arcs with label 9 and outgoing arcs with label 10 changed. I think the two fst are not equivalent even after making the 2nd one stochastic. Do you mean it will not noticeably affect the results?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/k2-fsa/k2/issues/550#issuecomment-751320490, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO23QKT7V65YIY5CF43SWVYZHANCNFSM4VI75OPA .
Haowen: I haven't followed the details here, but if invert can't handle other attributes it should crash. Actually I am not convinced that invert() is the best interface, or at least, I think maybe there should be an underlying operation that it calls that is also accessible to users. It seems to me that the essential quality of what we want, is to expand the named attributes from ragged tensors to flat tensors, and inversion (i.e. swapping labels and aux_labels) could be done afterward as a separate operation; this is more in the spirit of not making aux_labels too special. Notice that Expand() only takes the shape. If there are multiple named attributes with ragged shape, it should be straightforward to find a ragged shape that "covers" them, i.e. the num-elements for each arc is the max of the individual ones. E.g. /* Given a list of shapes with 2 axes and the same Dim0(), return the smallest shape that 'covers' all of them, i.e. size the i'th sub-list of the answer is the maximum of the sizes of the i'th sub-list of
srcs
@param [in] num_srcs Number of source shapes; must have 2 axes andDim0()
all equal @param [in] srcs Array of input shapes; inputs are*(srcs[0])
,*(srcs[1])
... @return Returns shape with the same Dim0() as all thesrcs
and sub-list sizes equal to the maximum of those of the sources. / RaggedShape CoveringShape(int32_t num_srcs, RaggedShape srcs); and we might need a function that tells us how the values of a ragged tensor align with those of a covering shape, e.g. / Returns an arc_map that says, for each element ofcovering
, the corresponding element ofsrc
, or -1 if there was no such element @param [in] src Shape that was likely an input toCoveringShape
... must have 2 axes @param [in] covering Shape with 2 axes, thecovering.Dim0()==src.Dim0()
, and sub-list sizes not less than the corresponding sub-list sizes ofsrc
@return Returns an array withDim() == covering.NumElements()
, containing, for each element ofcovering
, either the corresponding element ofsrc
or -1 if this was not applicable. E.g. if src == [ [ x x ] [ x ] ] and covering == [ [ x x x ] [x] ], would return [ 0 1 -1 2 ]. / Array1<int32_t> CoveringShapeForwardMap(RaggedShape &src, RaggedShape &covering); So you'd do, in python, expand(fsa), and it would treat all attributes the same. Note: the final -1's might not end up in the right place, but this would be easy to fix by post-processing the linearized elements, where needed. … On Sat, Dec 26, 2020 at 11:40 AM Yiming Wang @.> wrote: BTW, I realized remove_epsilons_iterative_tropical is in tropical semiring, so after applying this to H for example, sumexp(scores) for all outgoing scores from a specific state will no longer be 1.0, which may have some potential problem in some cases (?) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#550 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOYP36L2VJZV7XZDZEDSWVLL7ANCNFSM4VI75OPA .
OK, will do this next week (may come back to you if there's any question), thanks!
@danpovey in LF-MMI, I did
num_graph.scores = num_graph.scores.new_zeros(num_graph.scores.size())
num_graph = k2.connect(k2.intersect(num_graph, den_graph, treat_epsilons_specially=False))
where I make sure num_graph
and den_graph
both have no label 0, but still tot_scores of num_graph > tot_scores of den_graph when intersected with the same dense_fsa_vec. Do you have any clue on this?
Perhaps the den graph was determinized and the num graph was not? It could happen if the num graph had different paths with the same symbol sequence.
On Sun, Dec 27, 2020 at 11:38 AM Yiming Wang [email protected] wrote:
@danpovey https://github.com/danpovey in LF-MMI, I did
num_graph.scores = num_graph.scores.new_zeros(num_graph.scores.size())num_graph = k2.connect(k2.intersect(num_graph, den_graph, treat_epsilons_specially=False))
where I make sure num_graph and den_graph both have no label 0, but still tot_scores of num_graph > tot_scores of den_graph when intersected with the same dense_fsa_vec. Do you have any clue on this?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/k2-fsa/k2/issues/550#issuecomment-751421933, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO5IKTZ6462GW2VBSLLSW2T4DANCNFSM4VI75OPA .
Hi
sorry if I post this question here, but I don't see other way of asking questions about k2, not yet anyway anyway, this will be 1-minute answer for all of you, I guess basically I don't get the fst composition in k2; if we have L and G, we intersect Linv with G, according to the intersection algorithm explained here in a toy example https://github.com/k2-fsa/k2/blob/master/docs/source/python_tutorials/fsa_algo/fsa_algo.rst#id37 well, I don't get it L has token ids on the input and words on the output and G is a word LM: how do I match the arcs of Linv with G if the output labels are now token ids? the explicative example is not helping me because it's an acceptor
how do I match the arcs of Linv with G if the output labels are now token ids?
Well, there is only one label
on an arc. All other labels are just attributes of an arc, existing only in Python.
If you have some knowledge about OpenFST, then please don't get confused about ilabel and olabel.
The following shows the definition of Arc
in k2:
https://github.com/k2-fsa/k2/blob/aab2dd77d1fa8e31b153c9ff95e10295811a8b12/k2/csrc/fsa.h#L31-L35
You see that an Arc
has only one label. When doing intersection, we only match the label
, as you can see there are no places to put other labels in C++.
When intersecting Linv
with G
, it uses labels
of Linv
and G
, which are both word IDs.
The aux_labels
, i.e., token IDs, from L_inv
is propagated to the resulting FSA, i.e., the intersection result.
thank you!