stanza
stanza copied to clipboard
Leverage multiple gpus for dependency parse model?
I am working on a project where I would like to run a large corpus of text through a stanza pipeline using the the processors 'tokenize,lemma,pos,depparse'. I am trying to leverage multi-gpus across the models in the pipeline (particularly the dependency parse as it seems to be the most computationally expensive). From what I can tell briefly looking over the various models in the source code, the use-gpu flag basically triggers a call to model.cuda() which will use the default device (or cuda:0) but will not leverage multiple gpus.
So I suppose my question is, does anyone see a way to leverage multiple gpus in a somewhat non-hack-ish way in the pipeline? I have a general understanding of the nn.DistributedDataParallel but I have never used it in such a complex pipeline before.
Does anyone have any ideas?
Thanks in advance.
Hi @masonedmison, supporting multiple GPUs and maximizing processing speed is our future direction, but unfortunately, this improvement is unlikely to release in a short period. If you are going to processing a large amount of text, the best and the simplest way is to parallel in the data level, which means you split the original text file on your own and run multiple processes on different GPUs, then manually concatenate all the results.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi @masonedmison, supporting multiple GPUs and maximizing processing speed is our future direction, but unfortunately, this improvement is unlikely to release in a short period. If you are going to processing a large amount of text, the best and the simplest way is to parallel in the data level, which means you split the original text file on your own and run multiple processes on different GPUs, then manually concatenate all the results.
Could you please offer a coding sample for handling a large amount of text in Stanza? For example, a large sentence list with over 40,0000 sentences? I find it very difficult to call the neural pipeline on a large list of documents. Thanks.
Does this section help?
https://stanfordnlp.github.io/stanza/pipeline.html#processing-multiple-documents
"difficult" doesn't tell us what the problem is...