Frank Seide

Results 38 comments of Frank Seide

Do we know the corpus size in advance? Then we could say, e.g., translating more than 10,000 sentences is likely back-translation, and then enable a warning.

I am leaning against. Isn't setting a max length for back-translation something you forget once and never again?

In my mind, if one says `--max-length`, it should do something; but if it is not set, it should default to SIZE_MAX i.e. do nothing. And then Marcin will forget...

Please add an fflush() and check on that, and don't check on fclose().

> For the command-line option, we could use a custom format like we do with `--devices` and, for instance, use a semicolon to separate test sets. @snukky, that custom format...

@kpu, all valid Linux pathnames should be allowed without Marian-custom escaping hoops. (BTW, this also includes regular files named "stdin" and "stdout"...) @emjotde, if you want to pass a Yaml...

For context, Marcin's suggestion is based on me whining. During development, it sometimes happens that parameters written to the model file are not consistent with parameters used. E.g. initial training...

Sadly, multi-node training for large models typically only provides efficiency benefits if the nodes are connected via Infiniband with RDMA enabled. Without RDMA, we also cannot get any gain. The...