Keith Hughitt comments

Results 86 comments of


                                            Keith Hughitt

Handle unexpectedly large tokens prior to calling the pos, etc. processors?

Sure thing! It's pretty stock: ``` nlp = stanza.Pipeline(lang='en', processors='tokenize,pos') ``` 1. `pos_batch_size` was what I originally tried varying with no luck. 2. I also am using the `lemma` processor...

Handle unexpectedly large tokens prior to calling the pos, etc. processors?

Sounds good! I'll take a stab at it. Just to be clear: > but probably it's just easiest to cut off the length of tokens produced by the TokenizeProcessor ......

Handle unexpectedly large tokens prior to calling the pos, etc. processors?

I would probably go with the later. It probably won't matter much since what is most likely to be affected are not informative tokens to begin with, but, trimming could...

Check for and replace excessively long tokens with "UNK"; addresses issue 1137

Okay yea, I thought that might be the case and meant to ask about it, but seemed to have left that out in rearranging my comments. For the offsets, I...

Check for and replace excessively long tokens with "UNK"; addresses issue 1137

Okay, good points. I'll scrap the ahead-of-time approach. So it's alright for the offsets in the generated tokens to define a longer range (in the original text), than what is...

Check for and replace excessively long tokens with "UNK"; addresses issue 1137

I understand. I think you would know better than I would though, so we can hold off using such an approach until you / the other devs have had time...

Check for and replace excessively long tokens with "UNK"; addresses issue 1137

Yep! Had some other things to take care of, but I should have some time today/tomorrow. I'll update the pr / let you know if I run into any issues....

Check for and replace excessively long tokens with "UNK"; addresses issue 1137

Okay, I switched to the approach discussed above. Just let me know if there is anything I missed.

Check for and replace excessively long tokens with "UNK"; addresses issue 1137

Thanks for adding the last change, and for helping making sure I didn't completely mess things up with the PR heh. I appreciate your work on Stanza. Cheers.

Segfault encountered with coop::pcor()

Hi @wrathematics Thanks for the quick response and suggestions! So I am actually attempting to compute correlation matrices for both `x` and `t(x)`, so memory may be a bigger issue...