snscrape icon indicating copy to clipboard operation
snscrape copied to clipboard

Thread safety

Open JustAnotherArchivist opened this issue 2 years ago • 4 comments

Since people seem to keep trying to use snscrape with threads (despite this not being listed as a feature anywhere) and running into problems (seemingly without searching the issues)...

snscrape is currently not thread-safe.

I'd like to evaluate at some point whether it's easy enough to make snscrape thread-safe. One known issue is the Twitter module's guest token manager. Testing thread safety will be an issue, too.

Relevant prior issues: #307 #584 #622

(SEO keywords: threading multithreading)

JustAnotherArchivist avatar Dec 24 '22 22:12 JustAnotherArchivist

@JustAnotherArchivist you are saying snscrape is not thread-safe, but is it process safe? If I were to run multiple instances of the snscrape executable concurrently, would that cause issues?

IvanTrendafilov avatar Feb 04 '23 08:02 IvanTrendafilov

@IvanTrendafilov Yes, it is safe to run multiple instances of the CLI at the same time. Or indeed to use the snscrape package/modules from multiple independent Python processes in parallel (which is what the CLI does, anyway). The CLI also has code for token sharing between parallel Twitter scrapes.

JustAnotherArchivist avatar Feb 04 '23 08:02 JustAnotherArchivist

great news, thank you.

IvanTrendafilov avatar Feb 04 '23 08:02 IvanTrendafilov

@JustAnotherArchivist Do you have any brief idea why this error is occurring, and do you have any suggestions for how to work around it while still using the library to scrape faster? Additionally, I'm curious if you have any resources or suggestions for learning how to use the library for fast scraping, as I'm relatively new to this.

I wanted to mention that I faced this problem when using multi-threading, but interestingly enough, when I ran the code in the multi cmds, it worked fine.

obada-jaras avatar Mar 01 '23 21:03 obada-jaras