gmryu

Results 66 comments of gmryu

@theamato It is not "train". It happens in "preprocess". You need to check how you prepared your data.

I believe there is no "sharding the model in one node and train with multiple nodes with parallel data distribution." While I would say sharding the model over all nodes...

I believe current normal fairseq does not provide such feature. Not in command line for sure. -- As for the implementation, If your data is not that huge, (size is...

@martianmartina I guess your solution is not bad. (Though I do not understand what you mean incorporating to target sentences.) At first glance, I would have a new argument for...

@martianmartina Do not know if you still need my help. Sorry I am pretty poor at understanding your implementation. `prev_output_tokens` is the same as `target`, while `prev_output_tokens` are passed to...

@martianmartina Okay, I had the same problem facing `IndexedCachedDataset` and I choose to ignore it and use list instead. It is very brave and cool of you to use those...