neuralmonkey
neuralmonkey copied to clipboard
Chaining of dataset series preprocessor can fail.
When chaining multiple dataset series preprocessor steps, e.g.: preprocessors=[("source", "source_wp", <wp_prep>), ("source_wp", "source_wp_other", <other_prep>)]
The dataset.load can fail because there is no implicit order of processing the preprocessors list.
See: https://github.com/ufal/neuralmonkey/blob/master/neuralmonkey/dataset.py#L325
For the part of code, that should be fixed.
podívej se na komentář o dvě řádky nad tim, na co odkazuješ. Správně by se měl používat pipeline
processor. může se sem přidat nějaký stromový zpracování, ale to nefungovalo ani ve starým datasetu
Here is a suggestion:
def _add_preprocessed_series(iterators, s_name, prep_sl):
preprocessor, source = prep_sl[s_name]
if s_name in iterators:
return
if source in prep_sl:
_add_preprocessed_series(iterators, source, prep_sl)
if source not in iterators:
raise ValueError(
"Source series {} for series-level preprocessor nonexistent: "
"Preprocessed series '', source series ''".format(source))
iterators[s_name] = _make_sl_iterator(source, preprocessor)
[...]
for s_name in prep_sl:
_add_preprocessed_series(iterators, s_name, prep_sl)