Grant
Grant
Seeing extraction errors on certain websites that have titles. `File "/usr/local/lib/python2.7/site-packages/ContentAnalysis-0.1.1-py2.7.egg/ContentAnalysis/document.py", line 53, in parse ginfo = g.extract(url=self.link) File "/usr/local/lib/python2.7/site-packages/goose/__init__.py", line 56, in extract return self.crawl(cc) File "/usr/local/lib/python2.7/site-packages/goose/__init__.py", line 66,...
The readme suggests use of [GPT-neox-20b tokenizer](https://huggingface.co/EleutherAI/gpt-neox-20b/blob/main/config.json). This tokenizer has a BoS and EoS token mapped to token id 0. However when I look at the model implementation in PaLM-rlhf-pytorch,...