ludwig
ludwig copied to clipboard
<PAD> is not used as the padding token when tagger decoder is used
Describe the bug
When a generator
decoder is used then the predicted output contains the <PAD>
keywords which can be easily removed. But when a tagger
decoder is used, a character from the input string is used as the pad token and it is really difficult to differentiate between the actual data and the padded tokens.
Due to https://github.com/ludwig-ai/ludwig/issues/1130 I cannot use the latest version have to use version 0.2.2.8
which works without giving the error 0.3.3
gives.
Below is my model_definition.yaml
file:
training:
epochs: 50
early_stop: 10
batch_size: 128
input_features:
-
name: column2
type: text
level: word
encoder: rnn
cell_type: lstm
num_layers: 4
reduce_output: null
preprocessing:
word_tokenizer: space
padding_symbol: <PAD>
output_features:
-
name: column1
type: text
level: word
decoder: tagger
cell_type: lstm
loss:
type: sampled_softmax_cross_entropy
A small subset of my training dataset:
column1,column2
k k klk k hjkj hg k kg h k jlk k kj kg hk k k k k k k k klk kjh jkj hg ghk kj kh khgh hg,N S SHL S LHHL LL H SL H H LHL S SL HL HH S S S S S S S SHL SLL HHL LL SHH SL HL HLLH SL
hk lk kh klk l lmlk lml mn m klm mn m ml lj kl klk kjhj jh h h h klm l lm l l l l lk lmkl lkk hjh h h klk kj k klm mlkj kl ml lk lk m lk jkjh jh k k hkh hg hk lm kj gh hg hjk jh,NH HL SL HHL H SHLL HHL HH L LHH SH L SL SL HH LHL SLLH SL S S S HHH L SH L S S S SL HHLH SLS LHL S S HHL SL H SHH SLLL HH HL SL HL H LL LHLL HL H S LHL SL HH HH LL LH SL HHH LL
kj klkjkjh ghg hj j j jh jk hj ghjh hg g fg g g g hjhg hjh gf gh hkjklkjh hjhg hg,NL HHLLHLL LHL HH S S SL HH LH LHHL SL S LH S S S HHLL HHL LL HH SHLHHLLL SHLL HL
g j k l k h k k g g g k k kj g hk k kj h kj h g g,N H H H L L H S L S S H S SL L HH S SL L HL L L S
hkj k k k k kkk kh kl kmlkjk kj h hl l l lk kmlm jlkk j hl l l lk k k kmlm k k k kmlk k k k k kml k kkk hjkjhj jh jkl ljl lmlkj kjhg hg kk hk h kkkh jkl lmlk lkj,NHL H S S S SSS SL HH LHLLLH SL L SH S S SL SHLH LHLS L LH S S SL S S SHLH L S S SHLL S S S S SHL L SSS LHHLLH SL HHH SLH SHLLL HLLL HL HS LH L HSSL HHH SHLL HLL
e e e e de dcded edb cb dc de dc bcdc cb c ac c c bcdcb d dededc bc dcbc ba,N S S S LH LLHHL HLL HL HL HH LL LHHL SL H LH S S LHHLL H SHLHLL LH HLLH LL
g j k l l l k h l k j g h g f gh h k k jk h h g g,N H H H S S L L H L L L H L L HH S H S LH L S L S
f fedf d dfe fg g g ggf ed df ef d dhj h hgf g f efg fe de dd c ed d cd d df ghgfg e,N SLLH L SHL HH S S SSL LL SH LH L SHH L SLL H L LHH LL LH LS L HL S LH S SH HHLLH L
d fgh g g g g gh g g g ge g fgh fe dfc de fefed dc f ghg g gh g g gh g ghkjh jkjh g ghjhgh hg h gf g h kl kjkl lk h gf hkj klk kjh jh hg hg gh g g g fgh fe dfc de fefed g ghgf gh g ghkjh jkjh ghjhgh hg,N HHH L S S S SH L S S SL H LHH LL LHL HH HLHLL SL H HHL S SH L S SH L SHHLL HHLL L SHHLLH SL H LL H H HH LLHH SL L LL HHL HHL SLL HL SL HL SH L S S LHH LL LHL HH HLHLL H SHLL HH L SHHLL HHLL LHHLLH SL
my train command:
ludwig train --experiment_name tagger_model --data_csv training_file.csv --model_definition_file model_definition.yaml --output_directory results
and my test command:
ludwig test --data_csv small_test.csv --model_path results\\tagger_model_run\\model --output_directory results\\prediction\\tagger_model_run
The predicted result from the above is:
k,g,h,kl,l,k,l,m,l,k,l,l,j,l,j,k,hg,hg,fg,hk,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h
g,j,k,lm,l,mn,m,l,k,lm,l,l,kl,kj,hg,hg,f,h,kh,j,h,g,g,g,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h
f,f,f,e,g,h,f,fe,dc,c,f,f,f,f,e,g,h,h,hg,gf,f,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h
ghk,h,gf,gh,g,g,k,k,jh,k,l,lm,lm,kj,lm,h,jkl,k,j,kj,h,gf,f,g,g,g,gh,g,fe,def,fe,f,fg,gf,fe,de,d,d,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h,h
As can be seen the character h
is used as pad here..
But if I use the generator
decoder using the below model:
training:
epochs: 50
early_stop: 30
batch_size: 128
input_features:
-
name: column2
type: text
level: word
encoder: rnn
cell_type: lstm
num_layers: 4
reduce_output: null
preprocessing:
word_tokenizer: space
padding_symbol: <PAD>
output_features:
-
name: column1
type: text
level: word
decoder: generator
attention: bahdanau
cell_type: lstm
loss:
type: sampled_softmax_cross_entropy
then the predicted output uses the correct <PAD>
token as can be seen from the generated output.
k,g,h,kl,l,k,l,m,l,k,l,l,h,l,j,k,hg,hg,fg,hk,h,h,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>
g,j,k,lm,l,mn,m,k,lm,m,l,l,l,j,l,j,k,hg,f,hk,kj,h,g,g,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>
f,f,f,e,g,h,f,fe,dc,c,f,f,f,f,e,g,h,h,hg,gf,f,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>
ghk,h,gf,gh,g,g,h,klm,l,l,k,lm,l,kj,hk,k,k,k,l,k,lm,l,kj,klk,h,h,h,h,k,l,k,jk,lk,h,gf,gh,h,g,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>
Environment (please complete the following information):
- OS: Windows 10
- Python version: 3.6.8
- Ludwig version: 0.2.2.8
@farazk86 In regards to the tagger decoder, you don't have to use <PAD> to determine the sequence length. When a prediction is returned, the length of the sequence is returned as the second element in the tuple. For the Tagger use the length tensor to determine the output sequence.
Hopefully, this answered your question.
@farazk86 In regards to the tagger decoder, you don't have to use to determine the sequence length. When a prediction is returned, the length of the sequence is returned as the second element in the tuple. For the Tagger use the length tensor to determine the output sequence.
Hopefully, this answered your question.
Thanks Jim, but it looks like I the I can see the length of the sequence using the ludwig python api. I'm using the console to run ludwig, would this be possible using the command line or do I have to use the api?
Thanks
Right now it requires use of the api.
@farazk86, checking to see if this was resolved with @jimthompson5802's suggestion?