Keith Hughitt

Results 86 comments of Keith Hughitt

Sure thing! It's pretty stock: ``` nlp = stanza.Pipeline(lang='en', processors='tokenize,pos') ``` 1. `pos_batch_size` was what I originally tried varying with no luck. 2. I also am using the `lemma` processor...

Sounds good! I'll take a stab at it. Just to be clear: > but probably it's just easiest to cut off the length of tokens produced by the TokenizeProcessor ......

I would probably go with the later. It probably won't matter much since what is most likely to be affected are not informative tokens to begin with, but, trimming could...

Okay yea, I thought that might be the case and meant to ask about it, but seemed to have left that out in rearranging my comments. For the offsets, I...

Okay, good points. I'll scrap the ahead-of-time approach. So it's alright for the offsets in the generated tokens to define a longer range (in the original text), than what is...

I understand. I think you would know better than I would though, so we can hold off using such an approach until you / the other devs have had time...

Yep! Had some other things to take care of, but I should have some time today/tomorrow. I'll update the pr / let you know if I run into any issues....

Okay, I switched to the approach discussed above. Just let me know if there is anything I missed.

Thanks for adding the last change, and for helping making sure I didn't completely mess things up with the PR heh. I appreciate your work on Stanza. Cheers.

Hi @wrathematics Thanks for the quick response and suggestions! So I am actually attempting to compute correlation matrices for both `x` and `t(x)`, so memory may be a bigger issue...