[Pinterest] Downloaded pin image has low resolution
Brief
When I try to download image from Pinterest, returned result sometimes has low resolution. Example (https://www.pinterest.com/pin/70437489341156/).
Technical analysis
Here's the parser line https://github.com/imputnet/cobalt/blob/4b9644ebdfbfe7bc6f7ec2d476692e3619cb59bd/api/src/processing/services/pinterest.js#L34-L38
The service takes the first picture with proper extension matched by Regex. However, for the specified example picture the first picture is not of best quality, see output
[0] {src="https://i.pinimg.com/236x/7c/0a/1c/7c0a1c5f1c999a4a67f3c5b847da093c.jpg"}
[1] {src="https://i.pinimg.com/736x/7c/0a/1c/7c0a1c5f1c999a4a67f3c5b847da093c.jpg"}
[2] {src="https://i.pinimg.com/75x75_RS/9e/
Potential solution
Option 1 - Lookup for better resolution
I'm not expert in how Pinterest structures the data, but from names looks like it's possible to get image identifier part from first image 7c/0a/1c/7c0a1c5f1c999a4a67f3c5b847da093c.jpg and lookup for better image with the same id but better resolution {vvv}x
Option 2 - Parse images from json
When I was investigating page content I found that besides images provided as src=<something> there's a json structured pin data. It has much more information, such as original image URL (that is not present in src=<> pattern)
<script data-relay-response="true" type="application/json">
{
<OMITTTED>
"imageSpec_236x": {
"height": 295,
"width": 236,
"url": "https://i.pinimg.com/236x/7c/0a/1c/7c0a1c5f1c999a4a67f3c5b847da093c.jpg"
},
"imageSpec_orig": {
"url": "https://i.pinimg.com/originals/7c/0a/1c/7c0a1c5f1c999a4a67f3c5b847da093c.jpg"
},
<OMITTTED>
Not sure again if such data is available for every pin, but it looks like a more robust solution while src parsing could be used as fallback
reproduction steps
- Go to cobalt.tools
- Insert
https://www.pinterest.com/pin/70437489341156/ - Hit download
Actual result: Image has low quality Expected result: Image has the same quality as on pinterest page.
screenshots
links
https://www.pinterest.com/pin/70437489341156/
platform information
additional context
+1, reproduced accidentally with https://pinterest.com/pin/333618284916219545
Downloaded image was 236x236, original image is 736x736
After further digging(testing on this), it seems that on every image there's a script tag named "PWS_INITIAL_PROPS" that has a list of image sizes, including the original.
https://regex101.com/r/IAmYqE/1
const matchdigits = /(\d+)/gm;
JSON.parse(document.getElementById("__PWS_INITIAL_PROPS__").innerText).initialReduxState.pins[document.URL.match(matchdigits)[0]].images
Note that, as far as I've tested, this only works for when you're signed in - otherwise the "pins" object is empty
After even more further digging(testing on this), when you're not signed in, you can use the following regex:
let p = /https:\/\/i.pinimg.com\/(\d{3}x)\/[0-9a-f/]{41}\.jpg/gm;
[...new Set(document.body.innerHTML.match(p))];
to match all the image URLs.
pitfalls
-
This is time-sensitive, so it's best to run when the page is just loaded in; otherwise, it can't differentiate between the endless scroll content and the main content.
-
Note that the first image in the list is always the main content; perhaps this could be used to filter the list