kwrobel.eth comments

Results 59 comments of


                                            kwrobel.eth

filter_wikilinks does not work for links with added suffixes

I think that extracting links with anchor text is basic function, so should be included.

UNK tag decrease f1-score

It looks like micro f1 is not correctly calculated.

Issues with offset_mapping values

I confirm that `return_offsets_mapping` with `is_split_into_words` is confusing. It would be beneficial (especially while using `max_length` and `stride`) if `offsets_mapping` have also token indexes. Now it is hard to map...

Issues with offset_mapping values

It is more tricky using `max_length` and `stride`. Here is the solution for mapping subtokens to tokens: ``` token_index=-1 for offset_mappings, input_ids in zip(tokenized_tokens['offset_mapping'], tokenized_tokens['input_ids']): print(offset_mappings) tmp=[] for (start, end),...

Issues with offset_mapping values

I know the difference and it is IMHO not a misconception, but different definitions of words/tokens/subtokens. Whitespace splitting was just an example to simply create pretokenized example. So for me...

Issues with offset_mapping values

Thank you. I can tokenize each "my token" separately, but one call to `tokenizer` should be faster and I would have to implement `max_length` and `stride` by myself - this...

Benchmarks use .half() (FP16)?

Using transformers, FP16 on GPU usually does not change the scores, but the inference is faster 3-4 times. I hope for FP16 benchmarks using turbotransformers.

Nothing coming through on Skype when streaming to v4l2.

The same here (Ubuntu 18) with Skype and Microsoft Teams, but it works in Chrome (e.g. Google Hangout or Microsoft Teams).

Edit relation by clicking on arrow

Yes, it would be more comfortable.

ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).

It has problems with other chars, e.g. "ó". "ó".encode('utf-8') also doesn't work.