John Jansen

Results 8 comments of John Jansen

hey @emilebosch once you accept the collab, i can assign this to you :-)

I've been wondering about word2vec myself, not sure if/when i would get to it though

see https://yoast.com/ultimate-guide-robots-txt/ https://en.wikipedia.org/wiki/Robots_exclusion_standard separately consider sitemaps https://en.wikipedia.org/wiki/Sitemaps

@GrgDev its not really the CGI dep that is the issue, there are a couple of hurdles (from my sketchy memory) 1) the regex's (and there are alot / and...

its the meta programming im more worried about ... its a bit tricky to untangle (unless you have a clear head and are locked in a room in silence)

https://github.com/diasks2/pragmatic_tokenizer

You could argue that but tokenization is most commonly used while doing NLP so it’s kind of 50/50 in my opinion On 3 Jun 2019, 4:47 PM +1200, Chris Larsen...