Tal Perry

Results 27 issues of Tal Perry

Hey awesome implementation. Thanks. In translate.py, when I want to translate a source sentence, I still need to provide a target. Is the target the same as the source? Thanks

Related to #504 UNK is a special token. I'd expect it to have a 1 in the special tokens mask. But if I do ```python from tokenizers import Tokenizer from...

Stale

I was fooling around with a custom tokenizer and when I passed it text that wasn't in the vocabulary it simply ignored it :-( ```python from tokenizers.trainers import BpeTrainer from...

Stale

It would be cool if we could search for "Donald NOT trump" cause we want mentions of donald duck. - We'd need to build a query parser, which sucks. Maybe...

Not sure if this is possible or not but I think it is. Idea being that the user selects files, and we do the ingestion in a service worker, then...

Currently we only store the text field of the document, but a user would probbly want all of the metadata that came with it. Also, in the future we'd probably...

These days we store a **by the book** [postings list](https://nlp.stanford.edu/IR-book/html/htmledition/a-first-take-at-building-an-inverted-index-1.html) in indexeddb . A postings list is a map from tokens to the list of documents that contain them. The...

At least for the life of V0, we're likely to put the user database in an invalid state and crash everything. Eventually we'll fix that, but while this is experimental...

Pretty basic, but users need a way to download their data. Would probably be best if we let them choose whether to download the entire dataset or just the things...

[Regular Expression Matching with a Trigram Index ](https://swtch.com/~rsc/regexp/regexp4.html) describes how you can use a trigram index to prefilter possible candidates and then run the regex only on them. This is...