ragamints Bypass Instagram API Restrictions

Bypass Instagram API Restrictions

Open sebastienbarre opened this issue 9 years ago • 0 comments

While it appears the Instagram API is now useless, I would still like to recover all my metadata (caption, tags, GPS loc) and a backup of my photos.

It looks like it could be possible to bypass the API restrictions by... not using the API at all. All the info ragamints was retrieving through the API is buried in the user-facing Instagram web app itself. It would require rewriting this tool as a web scraper. The GPS coordinates can be found by scraping the LOCATION_ID embedded in the link attached to the location name in the photo page, loading the corresponding https://www.instagram.com/explore/locations/LOCATION_ID/ page, and scraping the latitude and longitude from:

                <meta property="place:location:latitude" content="42.6598" />
                <meta property="place:location:longitude" content="-73.7813" />

Obviously one would have to be careful about it, as it is likely against the Instagram TOS. It would also be slower, to remain "undetected". On the other hand, it would no longer require any authentication token.

It seems Instagram scrapers are popping up on Github and could be leveraged. Otherwise check:

Web Scraping Search on egghead.io
~~lapwinglabs/x-ray: The next web scraper~~ (no trivial Ajax support)
~~rchipka/node-osmosis: Web scraper for NodeJS~~ (no obvious Ajax support, lack of doc)
~~casperjs/casperjs: Navigation scripting and testing utility for PhantomJS and SlimerJS~~
segmentio/nightmare: A high-level browser automation library, also examples, rosshinkley/nightmare-load-filter, kyungw00k/nightmare-webrequest-addon, phulas/x-ray-nightmare

UPDATE: segmentio/nightmare is doing the trick.

The tentative new interface could be:

Download a single pic:

ragamints download https://www.instagram.com/p/BJEFDFDhkzy

Download multiple pics:

ragamints download https://www.instagram.com/p/BJEFDFDhkzy https://www.instagram.com/p/BIqp19VBOK_

Download an interval of pics from the same user (e.g. all pics between two pics, included):

ragamints download https://www.instagram.com/p/BJEFDFDhkz...https://www.instagram.com/p/BIqp19VBOK_

Download 10 most recent pics from user:

ragamints download https://www.instagram.com/sebastienbarre
ragamints download sebastienbarre

Download 3 most recent pics from user:

ragamints download https://www.instagram.com/sebastienbarre:3
ragamints download sebastienbarre:3

All of the above can be combined together.

The following options would be deprecated:

  -u, --user-id          Instagram user ID (or user name)  [string]
  -c, --count            Maximum count of medias to download
  -m, --min-id           Only medias posted later than this media id/url (included)  [string]
  -n, --max-id           Only medias posted earlier than this media id/url (excluded)  [string]
  -s, --sequential       Process sequentially (slower)  [boolean] [default: false]
  -r, --resolution       Resolution(s) to download, e.g. high_resolution,standard_resolution,low_resolution,thumbnail  [string]
  -t, --access-token     Instagram Access Token  [string]

The following options would still apply:

  -i, --include-videos   Include videos (skipped by default)  [boolean] [default: false]
  -d, --dest             Destination directory  [string] [default: "./"]
  -a, --always-download  Always download, even if media is saved already  [boolean] [default: false]
  -j, --json             Save media json object (accepts keys to pluck)  [default: false]
  -l, --clear-cache      Clear the cache  [boolean] [default: false]
  -v, --verbose          Output more info  [boolean] [default: false]
  -q, --quiet            Output less info  [boolean] [default: false]
  --config               Load config file  [default: "~/.ragamints.json"]
  -h, --help             Show help  [boolean]

Aug 15 '16 03:08 sebastienbarre

ragamints ragamints copied to clipboard

Bypass Instagram API Restrictions

ragamints
ragamints copied to clipboard