ptt-web-crawler
ptt-web-crawler copied to clipboard
Fix title and author may be null bug
Bug: If crawl https://www.ptt.cc/bbs/Gossiping/M.1597453894.A.61C.html, title is null
Reason: Cloudflare will encrypt email-like text
Solution:
Do decrypt if title or author is null, reference: https://stackoverflow.com/a/58111681
Because .string
returns object so it may be None
when other error happens, current change it to .text
In addition, content will have the same problem, currently not fixed.
For example, https://www.ptt.cc/bbs/Test/M.1597583163.A.209.html will get email protected
content