John Jansen
John Jansen
hey @emilebosch once you accept the collab, i can assign this to you :-)
I've been wondering about word2vec myself, not sure if/when i would get to it though
see https://yoast.com/ultimate-guide-robots-txt/ https://en.wikipedia.org/wiki/Robots_exclusion_standard separately consider sitemaps https://en.wikipedia.org/wiki/Sitemaps
@GrgDev its not really the CGI dep that is the issue, there are a couple of hurdles (from my sketchy memory) 1) the regex's (and there are alot / and...
its the meta programming im more worried about ... its a bit tricky to untangle (unless you have a clear head and are locked in a room in silence)
https://github.com/diasks2/pragmatic_tokenizer
beautiful!
You could argue that but tokenization is most commonly used while doing NLP so it’s kind of 50/50 in my opinion On 3 Jun 2019, 4:47 PM +1200, Chris Larsen...