John Bauer comments

Results 1064 comments of


                                            John Bauer

Poor Results on Large Corpus

I'm not sure when or if I'll have time to do a deep dive into this, but I will point out that fasttext should be better in general for agglutinative...

Sentence splitting with gold tokenization

Do you have some examples for the problem of periods separated by spaces? The tokenizer is (currently) an LSTM over characters, so if it hasn't seen the data, it won't...

Sentence splitting with gold tokenization

In double checking our tokenizer preparation code, I realize that only a limited number of datasets had the punctuation+space augmentation. I'll try to make that a bit more universal, or...

Sentence splitting with gold tokenization

I retrained the DE tokenizer with occasional whitespace before the sentence final punctuation, and that fixed up this particular sentence splitting. I'll try to add that as a training mechanism...

Sentence splitting with gold tokenization

This is now part of the 1.11.0 codebase, and the next time all models are retrained, they should pick up this improvement. DE in particular already has it

FutureWarning: You are using `torch.load` with `weights_only=False`

Aware of it. There's a limitation where we are saving plenty of things other than weights in the current file. Config strings and numbers, mostly. Would those still work? On...

FutureWarning: You are using `torch.load` with `weights_only=False`

Some of the models can be updated to use `weights_only=True` right away, but others require resaving with enums or other data structures removed. Will have to investigate some more.

FutureWarning: You are using `torch.load` with `weights_only=False`

I am finishing up some model training and will be able to make a new release with the updated models soon.

FutureWarning: You are using `torch.load` with `weights_only=False`

Got it, but that's the *main* branch. The updates merged in are in the dev branch, which at that line has `torch.load(... weights_only=True)` https://github.com/stanfordnlp/stanza/blob/5754ec0488636e90cdab26f43d44583d4efc99f0/stanza/models/common/pretrain.py#L60

FutureWarning: You are using `torch.load` with `weights_only=False`

This should now be pushed in v1.10.0