Nicolas Patry
Nicolas Patry
Hi @kkavyashankar0009 , Sorry but this contains just an extract of your code and I can't reproduce this it contains many missing bits and many things totally unrelated to the...
Do you mind sharing what was the issue ? It could help future readers.
@willfrey Thanks for the info. Currently we cannot include type annotations because the source also supports`signature(fn)` and `help(fn)` (in notebooks, REPLs) and those don't work properly with type annotations. Also...
Entirely correct ! I didn't pinpoint the issue yet, but it seems to just output the offsets of the last digit regardless of how many digits there are in the...
Sorry but no, there's no fast way to know, unless you do the full tokenization. Mileage may vary, and on specific tokenizers you could go faster than this lib because...
> I'm working on a task to compare function disassembly from binary files, maxmium token length of each function is set to 512, but for functions larger than 512, I...
Hi @zorikg , I will look a bit more in detail, but is there any reason you presplit your input here ? It seems like `tokenizer(s)` should do exactly what...
Ok, I looked into it, and it seems you just need to actually send `sequence_index` to you `char_to_token` function. ```python for sequence_index, split in enumerate(s.split(" ")): for char_index, c in...
Hi @beneyal , There's not at the moment anything planned, but contributions for one are very welcome ! :) The main thing would be adding and adhering to the CODE_OF_CONDUCT...
Hi @msaroufim , I went ahead and created a milestone to regroup various work that we need to get done at some point in the near future (as it will...