Anthony MOI comments

Results 33 comments of


                                            Anthony MOI

C++ bindings

Thank you for this PR @alexeyr! I'll do my best to have a look in the near future!

AddedVocabulary does not play well with the Model

Awesome! There's no rush so take the time you need! I'll add a few pieces of information that may help with the decision: - Every `Model` is different, and so...

Unable to use a different path than /

Indeed, you need to specify the path during the build phase using `—baseHref=“/mongo”`

[AddedVocabulary] add_tokens fn in added_vocabulary.rs adds ignored tokens to added_tokens map

I think the reason was that `add_special_tokens` delegates to `add_tokens` for actually adding the tokens to the relevant maps/structures, and we need the special tokens to be added there. But...

End-of-word suffix is not serialized for WordPiece

Thank you for reporting this! See https://github.com/huggingface/tokenizers/issues/570 for the explanation about the differences. We should definitely remove the `end_of_word_suffix` option from the `WordPieceTrainer` as it makes absolutely no sense to...

Support for UniCase encoding

Any specific reason for closing the issue? Did you manage to do what you wanted?

Convert saved pretrained tokenizers from transformers to tokenizers

There's no easy way for now. This will be possible, as soon as we have https://github.com/huggingface/tokenizers/issues/15

Convert saved pretrained tokenizers from transformers to tokenizers

There is no easy way at the moment. For tokenizers that use a BPE, you can probably do it manually in some cases, but you will need to dig into...

Quickest way to get to stacked tensor from input batch

Indeed, we do not integrate with any downstream solution at the moment and let you do it, as your use-case might be completely different from others. Do you have any...

Quickest way to get to stacked tensor from input batch

Thank you! As I was expected this method works for any `T` so the fact that `u32` has to be converted is true for a lot of different types. We...