stanza
stanza copied to clipboard
How to show progress bar in pipeline? [QUESTION]
Hi, I have been using stanza bulkprocess to tokenize and ssplit a rather large text stored in a dataframe. My question is how to show progress bar when running the pipeline?
import stanza
import pandas as pd
dummy_df = pd.read_parquet("../Data/Data_Frame/1987.parquet")
dummy = list(dummy_df.head(1000).TEXT)
nlp = stanza.Pipeline(lang='en', processors='tokenize')
docs = nlp.bulk_process(dummy)
...
Sorry, but that functionality currently does not exist (for the tokenize annotator, at least)
Sorry, but that functionality currently does not exist (for the tokenize annotator, at least)
thank you for your timely reply. Another question emerges while I was test running the tokenizing pipeline, that presently the GPU utilization rate is rather low (5% to 14% of my rtx 3070), how to maximize the GPU usage so as to make the whole process faster?
the data was the NYT annotated corpus, and there are 100'000 articles in the dataframe dummy_df. I tried to run the pipeline on the whole dataframe before sleep only to find the program crashed for unknown reason.
Certain annotators do quite a bit of their manipulation on the CPU. Fixing that and getting better GPU utilization would be a bit of a project. It is on the list of things to do, though
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.