neuralmonkey icon indicating copy to clipboard operation
neuralmonkey copied to clipboard

Chaining of dataset series preprocessor can fail.

Open varisd opened this issue 6 years ago • 2 comments

When chaining multiple dataset series preprocessor steps, e.g.: preprocessors=[("source", "source_wp", <wp_prep>), ("source_wp", "source_wp_other", <other_prep>)]

The dataset.load can fail because there is no implicit order of processing the preprocessors list.

See: https://github.com/ufal/neuralmonkey/blob/master/neuralmonkey/dataset.py#L325

For the part of code, that should be fixed.

varisd avatar Oct 18 '18 14:10 varisd

podívej se na komentář o dvě řádky nad tim, na co odkazuješ. Správně by se měl používat pipeline processor. může se sem přidat nějaký stromový zpracování, ale to nefungovalo ani ve starým datasetu

jindrahelcl avatar Oct 18 '18 15:10 jindrahelcl

Here is a suggestion:

   def _add_preprocessed_series(iterators, s_name, prep_sl):
       preprocessor, source = prep_sl[s_name]
       if s_name in iterators:
           return
       if source in prep_sl:
           _add_preprocessed_series(iterators, source, prep_sl)
       if source not in iterators:
           raise ValueError(
           "Source series {} for series-level preprocessor nonexistent: "
               "Preprocessed series '', source series ''".format(source))
       iterators[s_name] = _make_sl_iterator(source, preprocessor)
[...]
   for s_name in prep_sl:
       _add_preprocessed_series(iterators, s_name, prep_sl)

varisd avatar Oct 18 '18 15:10 varisd