Ky-Anh Huynh

Results 83 comments of Ky-Anh Huynh

@RobbiNespu I've just had a quick look. I believe that Google has changed [recently] their front-end application, and yeah, all the old ajax support (`?_escaped_fragment_` part in URI) have just...

* Proposal (2009): https://developers.google.com/search/blog/2009/10/proposal-for-making-ajax-crawlable * Deprecation (2015): https://developers.google.com/search/blog/2015/10/deprecating-our-ajax-crawling-scheme

@tjluoma right that's fine. Let me try on my laptop if I can reproduce your issue. Thanks

@tjluoma I have given a try, and I have 759 messages in `mbox` folder, and it's still counting. ``` $ pwd /home/foo/projects/icy/google-group-crawler/bbedit/mbox $ ls | wc -l 826 ``` I'd...

Interesting. I will take a look. Thanks for your reporting.

Google yields `empty` contents when `escaped_fragement` is specified, e.g. https://groups.google.com/forum/?_escaped_fragment_=forum/3dprintertipstricksreviews This is against (?) the standard. We need a different way to receive data from Google. This is a real...

Google hides most email headers from the raw message. A raw message isn't actually raw ;) See also https://groups.google.com/forum/message/raw?msg=3dprintertipstricksreviews/LDFZVHeC8Uk/2D1YhGqGDQAJ ``` Date: Sun, 20 Mar 2016 06:28:20 -0700 (PDT) From: Rich...

It's impossible to use traditional method to fetch data from this group. We need to use some higher level tool like `phantomjs`. Well, after days of trying `scrolling` method, I've...

hi @zipob , thanks a lot for your reporting. This is probably because the query wasn't encoded correctly before being sent to google server. I will try with some fix...

Google has spec. here https://developers.google.com/search/docs/ajax-crawling/docs/specification, but it doesn't sound working, e.g, https://groups.google.com/forum/?_escaped_fragment_=forum/sfnet.harrastus.audio%2Bvideo will generate a `invalid group name` error