tumblr-crawler-cli icon indicating copy to clipboard operation
tumblr-crawler-cli copied to clipboard

Feature Request

Open ppproxy opened this issue 6 years ago • 8 comments

hi, Have you considered about adding some features ? Just like, em...

  • Use the original 'raw file' for higher quality photos? Replace the 1280px image files if these raw image files exist.
  • Is it possible to filter out the image files of a specified size? Just like save the image files larger than 100k only.
  • For these private blogs, is it possible to authenticate over the CLI or a configuration file just like the cookie file 😄

ppproxy avatar Oct 05 '18 12:10 ppproxy

  1. tumblr blocked access to "raw" images at August 13, 2018. [source]
  2. this feature will coming soon, thanks for your advice.
  3. i dont understand about the "private blogs", could you give me some example?

tzw0745 avatar Oct 06 '18 10:10 tzw0745

  1. the latest commit add this feature. [edc531a]

tzw0745 avatar Oct 06 '18 11:10 tzw0745

wow, thanks for the response

  • Got it, but the 'photo_regex' in the file may be more intelligent to parse these non-standard format image files. photo_regex = re.compile(r'https://\d+.media.tumblr.com/\w{32}/tumblr_[\w.]+')

  • Here is an option named 'Visibility' in the Tumblr user's 'Edit appearance' page:

    Hide username.tumblr.com This Tumblr will only be viewable within the Tumblr dashboard. Learn more

    Hide username from search results It'll be hidden from Tumblr searches, and from external search engines like Google or Yahoo. Learn more

    username is explicit It'll only be viewable to 1) logged-in users who 2) have safe mode off Learn more

    If 'username is explicit', the script will show error message when we are trying to access these private blogs. ValueError: tumblr site "username" not found

    And here is something about 'Authentication login' 😄

    https://github.com/cyang812/get_tumblr_likes https://www.tumblr.com/docs/en/api/v2#auth

ppproxy avatar Oct 06 '18 12:10 ppproxy

  • photo_regex will only used if tumblr post (from /api/read) not in standard format. From my experience, the frequency it is used is very low.
  • When user enabled Hide username.tumblr.com, OAuth is the only choice. But Oauth is more complicated, and need to register the application (get the api key) first. Application has http request rate limit (1000/hour, 5000/day), so it is hard to share.

tzw0745 avatar Oct 06 '18 12:10 tzw0745

Thanks for the great tool!

If you don't mind, I also have two feature requests:

  1. Customize filename (with templating would be the best)
  2. Only download with certain tag, i.e. only downloads images/videos from http://{userid}.tumblr.com/tagged/{tagname}

fireattack avatar Nov 04 '18 22:11 fireattack

Thanks for the great tool!

If you don't mind, I also have two feature requests:

  1. Customize filename (with templating would be the best)
  2. Only download with certain tag, i.e. only downloads images/videos from http://{userid}.tumblr.com/tagged/{tagname}

first feature added now. second feture need tumblr api v2. In other word, need api key first.

tzw0745 avatar Nov 05 '18 08:11 tzw0745

Thanks!

Didn't realize it's done by API, thought it's by scraping.

fireattack avatar Nov 05 '18 08:11 fireattack

Thanks!

Didn't realize it's done by API, thought it's by scraping.

tumblr support highly flexible themes... scraping different html code is too annoying

tzw0745 avatar Nov 05 '18 08:11 tzw0745