stanza issues

Low performance in many-cores systems

1

**Describe the bug** When Stanza run from docker container at the server with more then ~20 cores - performance of the pipeline falling dramatically. **To Reproduce** 1) Get machine with...

StarTessar

bug

Check for and replace excessively long tokens with "UNK"; addresses issue 1137

10

## Description Added a check to find & replace excessively-long tokens with "UNK", in order to avoid downstream GPU memory in `POS`. See issue #1137 ## Approach To avoid having...

khughitt

Aws sagemaker tooling

Add a script to convert from AWS annotator reports to a report on how much work each annotator did

AngledLuffa

Kk trans

Transliterate kazakh to a latin alphabet

AngledLuffa

Handle unexpectedly large tokens prior to calling the pos, etc. processors?

6

**Problem / motivation** When processing large corpuses of text, the likelihood of encounter unexpected and ill-formatted inputs becomes large. In my case, I was processing a collection of texts, and...

khughitt

enhancement

Mismatched token output using custom stanza tokenizer

3

**Describe the bug** I trained a custom stanza tokenizer and mwt on UD_English-GUM. When using the tokenizer & mwt for inference, the tokenizer changed the surface form of the word....

yilunzhu

bug

How to replicate results of stanza constituency parser on Penn Treebank data

4

Hi, I'm trying to reproduce the results mentioned [here](https://stanfordnlp.github.io/stanza/constituency.html#available-models) for constituency parser on Penn treebank data. I have access to wsj data and I downloaded the `wsj_bert.pt` model by calling...

MHDBST

question

More combined models?

59

I saw the great idea for combined models here: https://stanfordnlp.github.io/stanza/combined_models.html Is there a process to request more of these? Specifically I was thinking of Hebrew right now.

amir-zeldes

enhancement

Inaccurate Dependency Tagging for Subordinates (ccomp)

18

Greetings all, I working on extracting subordinate clauses via Stanza (Indeed through spacy-stanze); however, dependency parsing seems to provide inaccurate results. Following the guide from https://universaldependencies.org [here](https://universaldependencies.org/u/dep/csubj.html), clausal subjects are...

fatihbozdag

bug

question

[QUESTION] Rebuilding an existing language from sources

9

This is my first question, so sorry if I make any mistake, but I haven't had found information about rebuilding an exsiting language from its sources. I plan doing this...

student-nlp-project

enhancement

question

stanza
stanza copied to clipboard

Metadata

Low performance in many-cores systems

Check for and replace excessively long tokens with "UNK"; addresses issue 1137

Aws sagemaker tooling

Kk trans

Handle unexpectedly large tokens prior to calling the pos, etc. processors?

Mismatched token output using custom stanza tokenizer

How to replicate results of stanza constituency parser on Penn Treebank data

More combined models?

Inaccurate Dependency Tagging for Subordinates (ccomp)

[QUESTION] Rebuilding an existing language from sources

← Metadata

Owner

Metadata

stanza stanza copied to clipboard

Metadata

← Metadata

Owner

Metadata

stanza
stanza copied to clipboard