ptt-web-crawler icon indicating copy to clipboard operation
ptt-web-crawler copied to clipboard

Fix title and author may be null bug

Open gitqwerty777 opened this issue 4 years ago • 0 comments

Bug: If crawl https://www.ptt.cc/bbs/Gossiping/M.1597453894.A.61C.html, title is null
Reason: Cloudflare will encrypt email-like text
Solution: Do decrypt if title or author is null, reference: https://stackoverflow.com/a/58111681 Because .string returns object so it may be None when other error happens, current change it to .text

In addition, content will have the same problem, currently not fixed.
For example, https://www.ptt.cc/bbs/Test/M.1597583163.A.209.html will get email protected content

gitqwerty777 avatar Aug 16 '20 13:08 gitqwerty777