argos-train icon indicating copy to clipboard operation
argos-train copied to clipboard

OpenNMT-py v3 support

Open argosopentech opened this issue 2 years ago • 2 comments

https://forum.opennmt.net/t/opennmt-py-v3-0-is-out/5077

The vanilla transformer uses sinusoidal positional encoding (position_encoding = true). We recommend to use “maximum relative positions” encoding instead (max_relative_positions=20, position_encoding=false) which again has a small overhead.

We kept the “fusedadam” (old legacy code) which provides the best performance in speed (compare to pytroch amp adam fp16, apex level O1/O2). We tested the new Adam(fused=true) released with pytorch 1.13 but it is way slower.

Always use the highest batch size possible (to your GPU ram capacity) and use an update interval according to the “true bach size” you want. For instance, if your GPU can accept 8192 tokens, then if you use accum_count=12, you will have a true batch size of 98304 tokens.

Adjust the bucket size to your CPU ram. Most of the time a bucket between 200K and 500K examples will be suitable. The highest your bucket size is, the less padding you will have since examples are sorted based on this bucket and batches yield from this bucket.

argosopentech avatar Nov 06 '22 02:11 argosopentech

https://github.com/OpenNMT/OpenNMT-py/issues/2242

argosopentech avatar Nov 06 '22 02:11 argosopentech

https://github.com/OpenNMT/OpenNMT-py/issues/2244

PJ-Finlay avatar Nov 06 '22 12:11 PJ-Finlay