Roman Grundkiewicz

Results 69 comments of Roman Grundkiewicz

For the command-line option, we could use a custom format like we do with `--devices` and, for instance, use a semicolon to separate test sets.

@frankseide OK, no problem. I just don't think something like `--valid-sets "[[set1.src, set1.tgt], [set2.src, set2.tgt]]"` is convenient, but unfortunately don't have a better idea.

I vote for removing those options and forcing using a script. A note for myself: remove the options, update regression tests, update information on the website, update examples.

With boost-based CLI (or actually before loading config files has been rewritten), it was possible to pass any option you wanted via a config file. That was useful during development...

Maybe this one https://gist.github.com/ax3l/9489132? It seems CUDA 12 doesn't support GCC version higher than 12.2.1.

In your .yaml file, try replacing lines like this one: ``` train-sets: /mnt/d/align_data/src-tgt-ms-score-12m-bias/shards-marian/corpus.{zh,en} ``` with proper YAML lists, for example: ``` train-sets: - /mnt/d/align_data/src-tgt-ms-score-12m-bias/shards-marian/corpus.zh - /mnt/d/align_data/src-tgt-ms-score-12m-bias/shards-marian/corpus.en ```

This comment: https://github.com/marian-nmt/marian-dev/blob/da6e30bfe3f12a05a74fda2737f31043afc94c18/src/embedder/embedder.h#L62..L63 suggests that the vocab is duplicated for the user. Have you maybe tried `$MARIAN/marian embed -t data.ja paraphrase.ja -v vocab.ja.spm -m model.npz --compute-similarity`?

Hi, I would recommend you to try the most recent version that we develop at https://github.com/AppraiseDev/Appraise and check if the issue persists.