snscrape
snscrape copied to clipboard
Thread safety
Since people seem to keep trying to use snscrape with threads (despite this not being listed as a feature anywhere) and running into problems (seemingly without searching the issues)...
snscrape is currently not thread-safe.
I'd like to evaluate at some point whether it's easy enough to make snscrape thread-safe. One known issue is the Twitter module's guest token manager. Testing thread safety will be an issue, too.
Relevant prior issues: #307 #584 #622
(SEO keywords: threading multithreading)
@JustAnotherArchivist you are saying snscrape is not thread-safe, but is it process safe? If I were to run multiple instances of the snscrape executable concurrently, would that cause issues?
@IvanTrendafilov Yes, it is safe to run multiple instances of the CLI at the same time. Or indeed to use the snscrape package/modules from multiple independent Python processes in parallel (which is what the CLI does, anyway). The CLI also has code for token sharing between parallel Twitter scrapes.
great news, thank you.
@JustAnotherArchivist Do you have any brief idea why this error is occurring, and do you have any suggestions for how to work around it while still using the library to scrape faster? Additionally, I'm curious if you have any resources or suggestions for learning how to use the library for fast scraping, as I'm relatively new to this.
I wanted to mention that I faced this problem when using multi-threading, but interestingly enough, when I ran the code in the multi cmds, it worked fine.