rod icon indicating copy to clipboard operation
rod copied to clipboard

Download a network request like image or xhr/json

Open gregtzar opened this issue 3 years ago • 6 comments

Rod Version: v0.101.4

Hey there @ysmood . As a context for this question, I have worked alot with chromedp in the past and even contributed a bit, and recently discovered go-rod which solves many of the issues (such as iframe access, file downloads, and error handling) that I was having to write complex helpers for in chromedp. I added one such helper to their examples repo for doing image downloads. I was just wondering if go-rod had an easy way to download the byte stream from a network request that has already returned successful, such as an image, xhr/json request, or even an pdf that loads in the window. The MustWaitDownload technique does not seem to work for this kind of thing. If it's not already supported, maybe I could submit a PR for a helper? Just let me know what you think the function signature / contract should be. I'm loving go-rod so far, thanks for building this great library!

gregtzar avatar Aug 10 '21 00:08 gregtzar

How about using https://go-rod.github.io/#/page-resources/README

ysmood avatar Aug 10 '21 03:08 ysmood

@ysmood The page.GetResource helper does not work because the resource I am looking for is not found under the DevTools > Application > Frames list of resources. In fact there is no group there at all for XHR requests. If I try anyways I get this error: -32000 No resource with given URL found. However, the URL I am searching for is definitely listed in the XHR requests under the DevTools > Network tab.

gregtzar avatar Aug 11 '21 06:08 gregtzar

Have you tried hijack?

Also, I'm glad to accept good ideas and code, how about you make a PR to demo how it works, I will try my best to provide help. I think the method signature may be like rod.Page.Fetch(url string) *http.Request?

ysmood avatar Aug 11 '21 06:08 ysmood

@ysmood Yeah I was thinking of going either the route of hijack or network events. With chromedp I went the network events route. I'm happy to help write something but just wanted to make sure I wasn't duplicating existing functionality. So I'll submit a PR.

I like that method signature and would expand it like this:

rod.Page.FetchByURL(regex string) (*http.Request, *http.Response, error)

Do you see that method using hijack or intercepting network requests under the hood? I noticed that hijack already uses cdp.Client.Request and cdp.Client.Response structs. Are there http.Request and http.Response structs hanging off the network events?

gregtzar avatar Aug 11 '21 07:08 gregtzar

Are there http.Request and http.Response structs hanging off the network events?

No, we only use them internally, if you read the source code:

https://github.com/go-rod/rod/blob/4f045fd526c3f2ec143d5eafbb0b2961d09881b6/hijack.go#L155

I think it's pretty easy to understand how hijack works.

To design your requirement, you don't need to use cdp.Client directly.

ysmood avatar Aug 11 '21 10:08 ysmood

Working on getting the tests passing locally so I can submit a PR for this issue, but currently blocked here: https://github.com/go-rod/rod/issues/481

gregtzar avatar Aug 30 '21 22:08 gregtzar