JustAnotherArchivist comments

Results 394 comments of


                                            JustAnotherArchivist

Support for Gab

There was a module for Gab before they moved to Mastodon. Now, maybe a general Mastodon module (#43) will work, but Gab's instance is modified quite heavily, so leaving this...

Support for Gab

I looked into this a bit. A generic Mastodon scraper will not work because Gab is so heavily modified. Also, only very little content is accessible without logging in: 10...

fix(vkontakte): update photo detection

Thanks a lot, and apologies for the delay!

IndexError of Instagram

As the comment there suggests, this is due to changes on Instagram's side. They recently overhauled their site a bit. The scraper needs to be adapted to those changes.

IndexError of Instagram

No, but there hasn't been anything worth saying. This issue, along with any other Instagram or Facebook issues, is effectively blocked by their silly rate limits. They make development of...

IndexError of Instagram

@purut18 I don't recall the exact format etc., but it was basically some context information (profile, hashtag, location, etc.) and the first page of posts, I believe.

Document the individual modules better

That would be greatly appreciated! Preferably, this part would be directly in the code as docstrings since that makes it much less likely to forget about updating the docs on...

Document the individual modules better

Unfortunately, numpydoc doesn't support proper type hints: https://github.com/numpy/numpydoc/issues/196 The ideal solution would be something like https://github.com/agronholm/sphinx-autodoc-typehints/issues/149 I think. I haven't done much research on this though (yet).

Google Groups?

Borderline, I'd say. Google Groups is basically a mailing list server, not really a social network. I'm not fundamentally opposed to it though.

Support for retrieving live data from Reddit rather than using the Pushshift data

No, unfortunately that's currently not possible. The Reddit scraper uses Pushshift because Reddit's own endpoints have ridiculous limitations (for example, there's a hard limit of 1000 submissions on subreddit/user lists,...