kwrobel.eth

Results 59 comments of kwrobel.eth

I think that extracting links with anchor text is basic function, so should be included.

It looks like micro f1 is not correctly calculated.

I confirm that `return_offsets_mapping` with `is_split_into_words` is confusing. It would be beneficial (especially while using `max_length` and `stride`) if `offsets_mapping` have also token indexes. Now it is hard to map...

It is more tricky using `max_length` and `stride`. Here is the solution for mapping subtokens to tokens: ``` token_index=-1 for offset_mappings, input_ids in zip(tokenized_tokens['offset_mapping'], tokenized_tokens['input_ids']): print(offset_mappings) tmp=[] for (start, end),...

I know the difference and it is IMHO not a misconception, but different definitions of words/tokens/subtokens. Whitespace splitting was just an example to simply create pretokenized example. So for me...

Thank you. I can tokenize each "my token" separately, but one call to `tokenizer` should be faster and I would have to implement `max_length` and `stride` by myself - this...

Using transformers, FP16 on GPU usually does not change the scores, but the inference is faster 3-4 times. I hope for FP16 benchmarks using turbotransformers.

The same here (Ubuntu 18) with Skype and Microsoft Teams, but it works in Chrome (e.g. Google Hangout or Microsoft Teams).

Yes, it would be more comfortable.