twint icon indicating copy to clipboard operation
twint copied to clipboard

Fix search

Open 9ary opened this issue 1 year ago • 34 comments

Search now requires being logged in + a CSRF token.

This PR adds a CLI flag to provide an authentication cookie (must be obtained by logging in with a browser, in Firefox the cookie can be found in the developer toolbox under the storage tab).

It looks like a randomly generated CSRF token works, so no complicated mechanism is required to obtain one.

Fixes #11. Fixes #13.

9ary avatar May 09 '23 13:05 9ary

Tests are now capable of passing on this branch. The first two commits (including https://github.com/woluxwolu/twint/pull/8) take care of fixing bugs that already prevented tests from working, independently of Twitter's latest changes.

bb010g avatar May 10 '23 23:05 bb010g

That sounds great! Can you tell how stable this solution will be if you run twint regularly on a daily basis, i.e., how fast will the token/cookie expire? Will it work when you use the token/cookie to run twint on a different machine/IP address?

LinqLover avatar May 11 '23 06:05 LinqLover

Can you tell how stable this solution will be if you run twint regularly on a daily basis, i.e., how fast will the token/cookie expire?

No idea yet, but we run a twint job every 12 hours on github actions (https://github.com/catgirl-v/cubari/actions), so we'll find out soon enough.

That sounds great! Can you tell how stable this solution will be if you run twint regularly on a daily basis, i.e., how fast will the token/cookie expire? Will it work when you use the token/cookie to run twint on a different machine/IP address?

It's working so far.

9ary avatar May 11 '23 06:05 9ary

Is working for you guys? In my case this error is popping up, any advice?

"ConnectionError: Access forbidden, try passing --auth-token."

leonardoulloa21 avatar May 12 '23 04:05 leonardoulloa21

Yes, it's working. I'm gonna need more details to help you. Did you in fact pass a valid authentication cookie as per the op? If so, please post minimum example that reproduces the problem.

9ary avatar May 12 '23 05:05 9ary

Do I need to pass a valid authentication cookie, how so? I just use the changes in this pr and try to execute my previous code the that error message popped up. How can I do what you recommed?

leonardoulloa21 avatar May 12 '23 06:05 leonardoulloa21

Sounds to me like you didn't read any of the conversation in #13 and here. The error message is very clear, you need an auth token. This is the whole point of this PR: Twitter now requires login to search. Instructions are in the op.

9ary avatar May 12 '23 06:05 9ary

Brilliant solution, works just fine. Thanks.

luxoflux avatar May 12 '23 11:05 luxoflux

Sounds to me like you didn't read any of the conversation in #13 and here. The error message is very clear, you need an auth token. This is the whole point of this PR: Twitter now requires login to search. Instructions are in the op.

My bad, I though that csrf_token = random.randbytes(16).hex() was it but I need to replace it with my auth token witch I get from Firefox browser, right? because I did make the change and I'm still having the same error ("ConnectionError: Access forbidden, try passing --auth-token."). Maybe am I doing something wrong? Some help would be nice please :)

leonardoulloa21 avatar May 12 '23 17:05 leonardoulloa21

No, you don't have to modify the code. Pass the token with the --auth-token flag, or set the TWITTER_AUTH_TOKEN environment variable.

CSRF is unrelated, it's just that both changes were required to actually get it to work.

9ary avatar May 12 '23 17:05 9ary

No, you don't have to modify the code. Pass the token with the --auth-token flag, or set the TWITTER_AUTH_TOKEN environment variable.

CSRF is unrelated, it's just that both changes were required to actually get it to work.

I have my code implemented in AWS Lambda with twint's library as a layer. I update the lib and set the env variable as mentioned but I still having the same error. Locally, I'm getting the same result, if you could I would love to have some help :)

[CRITICAL] 2023-05-12T20:53:44.334Z 38205fb9-65a5-41b2-b6ce-377909a1b4e3 twint.run:Twint:Feed:noData'data' sleeping for 1.0 secs [CRITICAL] 2023-05-12T20:53:45.425Z 38205fb9-65a5-41b2-b6ce-377909a1b4e3 twint.run:Twint:Feed:noData'data' sleeping for 8.0 secs [CRITICAL] 2023-05-12T20:53:53.524Z 38205fb9-65a5-41b2-b6ce-377909a1b4e3 twint.run:Twint:Feed:noData'data' sleeping for 27.0 secs

leonardoulloa21 avatar May 12 '23 21:05 leonardoulloa21

Thank you for the fix @9ary, works great! 😃

Tiny request, is it possible to add a wait time to prevent rate limits?

Looks like --min-wait-time is supposed to be automatically adjusted but I still get TokenExpiryException: Rate limit exceeded

ap.add_argument("--min-wait-time", type=float, default=15,
                    help="specifiy a minimum wait time in case of scraping limit error. This value will be adjusted by twint if the value provided does not satisfy the limits constraints")

batmanscode avatar May 14 '23 01:05 batmanscode

For what it's worth, it seems the owner of this repo is inactive, so this PR is unlikely to be merged anytime soon. We've set up a fork at https://github.com/catgirl-v/twint.

@leonardoulloa21 @batmanscode please open issues over there with the code or command line invocation that reproduces your problems. It's not practical to do all development and troubleshooting in a single PR thread.

9ary avatar May 14 '23 06:05 9ary

For what it's worth, it seems the owner of this repo is inactive, so this PR is unlikely to be merged anytime soon. We've set up a fork at https://github.com/catgirl-v/twint.

@leonardoulloa21 @batmanscode please open issues over there with the code or command line invocation that reproduces your problems. It's not practical to do all development and troubleshooting in a single PR thread.

Makes sense, thanks!

batmanscode avatar May 14 '23 11:05 batmanscode

I replacd everything on the changes on the py files of my twint but i keep getting the ones below on all of my searches.

module 'random' has no attribute 'randbytes'

corpuzdonn avatar May 18 '23 16:05 corpuzdonn

I replacd everything on the changes on the py files of my twint but i keep getting the ones below on all of my searches.

module 'random' has no attribute 'randbytes'

You have to use python 3.9 or above. It's mentioned in some of the early comments

batmanscode avatar May 18 '23 16:05 batmanscode

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function?

i'm very new at this. How do i pass the auth token?

corpuzdonn avatar May 19 '23 05:05 corpuzdonn

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function?

i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token

Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN

It should then run. Good luck

batmanscode avatar May 20 '23 05:05 batmanscode

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function? i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token

Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN

It should then run. Good luck

Thanks it's working now.

corpuzdonn avatar May 20 '23 07:05 corpuzdonn

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function? i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token

Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN

It should then run. Good luck

Hey @batmanscode

Would you mind testing my code and tell me if you are getting the same error message?

I'm trying to run it in jupyternotebook and then in AWS Lambda.

`import twint import os import nest_asyncio

os.environ["TWITTER_AUTH_TOKEN"] = "my_token"

nest_asyncio.apply()

c = twint.Config() c.Username = "BCPComunica" c.Since="2023-05-21" c.Limit = 100 twint.run.Search(c)`

I'm getting this error: CRITICAL:root:twint.run:Twint:Feed:noData'data' sleeping for 1.0 secs

Hope you can give a hand!

Thanks in advanced

leonardoulloa21 avatar May 22 '23 00:05 leonardoulloa21

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function? i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token

Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN

It should then run. Good luck

I can only find Authentication tokens, and they're found in the developer portal, I didn't see any 'developer tools' or 'storage' on Firefox. Which of them Should I use?

JoelBird avatar May 24 '23 16:05 JoelBird

@JoelBird hopefully this is detailed enough:

  • go to twitter.com
  • log in
  • press F12, the developer toolbox will appear
  • click the storage tab
  • on the left, select cookies > https://twitter.com
  • find the cookie named auth_token
  • double-click the value and copy it

9ary avatar May 24 '23 19:05 9ary

Hi @9ary, thanks for the fix. But for now, using the command line, only the -u parameter works, the search parameter -s isn't work. Any idea why it isn't. I'm trying to debug it here.

I'm getting CRITICAL:root:twint.run:Twint:Feed:noData'data' with twint -s pineapple but twint -u username works fine

marquisvictor avatar May 24 '23 20:05 marquisvictor

I'm having issues of Rate Limit exceeded? How do i fix this? what should i keep looping to override this?

corpuzdonn avatar May 25 '23 03:05 corpuzdonn

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function? i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token

Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN

It should then run. Good luck

Hey @batmanscode

Would you mind testing my code and tell me if you are getting the same error message?

I'm trying to run it in jupyternotebook and then in AWS Lambda.

`import twint import os import nest_asyncio

os.environ["TWITTER_AUTH_TOKEN"] = "my_token"

nest_asyncio.apply()

c = twint.Config() c.Username = "BCPComunica" c.Since="2023-05-21" c.Limit = 100 twint.run.Search(c)`

I'm getting this error: CRITICAL:root:twint.run:Twint:Feed:noData'data' sleeping for 1.0 secs

Hope you can give a hand!

Thanks in advanced

I'm not sure, sorry. I'm having the same issue :(

batmanscode avatar Jun 16 '23 20:06 batmanscode

For those that this is working for, would someone be able run through

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function? i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN It should then run. Good luck

Hey @batmanscode Would you mind testing my code and tell me if you are getting the same error message? I'm trying to run it in jupyternotebook and then in AWS Lambda. import twint import os import nest_asyncio os.environ["TWITTER_AUTH_TOKEN"] = "my_token" nest_asyncio.apply() c = twint.Config() c.Username = "BCPComunica" c.Since="2023-05-21" c.Limit = 100 twint.run.Search(c) I'm getting this error: CRITICAL:root:twint.run:Twint:Feed:noData'data' sleeping for 1.0 secs Hope you can give a hand! Thanks in advanced

I'm not sure, sorry. I'm having the same issue :(

Hi all, is this still working for anyone? I'm experiencing the same issue as @leonardoulloa21

a-annor avatar Jun 24 '23 15:06 a-annor

For those that this is working for, would someone be able run through

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function? i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN It should then run. Good luck

Hey @batmanscode Would you mind testing my code and tell me if you are getting the same error message? I'm trying to run it in jupyternotebook and then in AWS Lambda. import twint import os import nest_asyncio os.environ["TWITTER_AUTH_TOKEN"] = "my_token" nest_asyncio.apply() c = twint.Config() c.Username = "BCPComunica" c.Since="2023-05-21" c.Limit = 100 twint.run.Search(c) I'm getting this error: CRITICAL:root:twint.run:Twint:Feed:noData'data' sleeping for 1.0 secs Hope you can give a hand! Thanks in advanced

I'm not sure, sorry. I'm having the same issue :(

Hi all, is this still working for anyone? I'm experiencing the same issue as @leonardoulloa21

It's working fine on my end.

Output: 1672684481071976449 2023-06-25 03:13:51 +0800 @kvafelled @kvafelled se debe esperar el plazo de 15 días calendario aproximadamente. 1672663630951878656 2023-06-25 01:51:00 +0800 @kvafelled Hola, @kvafelled 👋 Te contamos que cuando se realiza una cancelación, anulación o reembolso de compra por parte de alguna empresa, estas tienen hasta 15 días (calendario) para proceder con la devolución del dinero a tu cuenta de ahorros. 🤝 ... ... ... @Maverick99210 ¡Hola, @Maverick99210! 👋 Lamentamos el inconveniente generado, por favor, envíanos tu DNI vía DM para poder orientarte de la mejor manera. Esperamos tu mensaje. 1668754764476346368 2023-06-14 06:58:34 +0800 @mendezt_29 Hola @mendezt_29 Envíanos un DM con la captura de pantalla de lo que te aparece y el número de tu DNI. Quedamos atentos. 1668710970741628928 2023-06-14 04:04:33 +0800 @PALICUYA ¡Hola Pilar!👋 Por favor envíanos un inbox con tu DNI y la imagen que te aparece aquí 👉🏻 https://t.co/HE00YFfJez. Quedamos atentos. 🤝 1668690239685267457 2023-06-14 02:42:10 +0800 @vladineitor Elsa, gracias por la información. Estamos reportando lo sucedido al equipo a cargo, para que se pueda hacer las consultas y verificaciones al respecto. Lamentamos mucho la molestia 1668687790408884224 2023-06-14 02:32:26 +0800 @vladineitor Hola, Elsa. Queremos conocer lo ocurrido. Por favor, detállanos vía DM el inconveniente presentado y la ubicación de la Agencia (avenida/calle/número/alguna referencia). Quedamos atentos. 1668631670076116997 2023-06-13 22:49:26 +0800 @RodrigoVinyas ¡Hola, Rodrigo! 👋 Nos importa mucho la experiencia de cada uno de nuestros clientes, agradeceríamos que puedas comunicarte al 01 311 9400 de L-V de 7:00 a.m. a 5:00 p.m. con nuestra área de Soluciones de Pagos, para solicitar alguna facilidad de pago o un compromiso de pagos. [!] No more data! Scraping will stop now. found 0 deleted tweets in this search.

corpuzdonn avatar Jun 25 '23 12:06 corpuzdonn

Thanks @corpuzdonn, maybe it's a token issue from my end

I attempted a huge scrape (4 weeks via search terms) and that got rate limited. Maybe that token wasn't valid after that

Have you tried long scrapes? I saw there's a time out parameter but even setting that very high didn't work for me

batmanscode avatar Jun 25 '23 13:06 batmanscode

For those that this is working for, would someone be able run through

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function? i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN It should then run. Good luck

Hey @batmanscode Would you mind testing my code and tell me if you are getting the same error message? I'm trying to run it in jupyternotebook and then in AWS Lambda. import twint import os import nest_asyncio os.environ["TWITTER_AUTH_TOKEN"] = "my_token" nest_asyncio.apply() c = twint.Config() c.Username = "BCPComunica" c.Since="2023-05-21" c.Limit = 100 twint.run.Search(c) I'm getting this error: CRITICAL:root:twint.run:Twint:Feed:noData'data' sleeping for 1.0 secs Hope you can give a hand! Thanks in advanced

I'm not sure, sorry. I'm having the same issue :(

Hi all, is this still working for anyone? I'm experiencing the same issue as @leonardoulloa21

It's working fine on my end.

Output: 1672684481071976449 2023-06-25 03:13:51 +0800 @kvafelled @kvafelled se debe esperar el plazo de 15 días calendario aproximadamente. 1672663630951878656 2023-06-25 01:51:00 +0800 @kvafelled Hola, @kvafelled 👋 Te contamos que cuando se realiza una cancelación, anulación o reembolso de compra por parte de alguna empresa, estas tienen hasta 15 días (calendario) para proceder con la devolución del dinero a tu cuenta de ahorros. 🤝 ... ... ... @Maverick99210 ¡Hola, @Maverick99210! 👋 Lamentamos el inconveniente generado, por favor, envíanos tu DNI vía DM para poder orientarte de la mejor manera. Esperamos tu mensaje. 1668754764476346368 2023-06-14 06:58:34 +0800 @mendezt_29 Hola @mendezt_29 Envíanos un DM con la captura de pantalla de lo que te aparece y el número de tu DNI. Quedamos atentos. 1668710970741628928 2023-06-14 04:04:33 +0800 @PALICUYA ¡Hola Pilar!👋 Por favor envíanos un inbox con tu DNI y la imagen que te aparece aquí 👉🏻 https://t.co/HE00YFfJez. Quedamos atentos. 🤝 1668690239685267457 2023-06-14 02:42:10 +0800 @vladineitor Elsa, gracias por la información. Estamos reportando lo sucedido al equipo a cargo, para que se pueda hacer las consultas y verificaciones al respecto. Lamentamos mucho la molestia 1668687790408884224 2023-06-14 02:32:26 +0800 @vladineitor Hola, Elsa. Queremos conocer lo ocurrido. Por favor, detállanos vía DM el inconveniente presentado y la ubicación de la Agencia (avenida/calle/número/alguna referencia). Quedamos atentos. 1668631670076116997 2023-06-13 22:49:26 +0800 @RodrigoVinyas ¡Hola, Rodrigo! 👋 Nos importa mucho la experiencia de cada uno de nuestros clientes, agradeceríamos que puedas comunicarte al 01 311 9400 de L-V de 7:00 a.m. a 5:00 p.m. con nuestra área de Soluciones de Pagos, para solicitar alguna facilidad de pago o un compromiso de pagos. [!] No more data! Scraping will stop now. found 0 deleted tweets in this search.

Would you mind packling up your twint library and share it to us, please! I might be doing something wrong because I have just tried it and I got the same result :

CRITICAL:root:twint.run:Twint:Feed:noData'data' sleeping for 1.0 secs

I don't think that this message is related to the auth token, it has to be something else... Thanks in advanced for your time @woluxwolu

leonardoulloa21 avatar Jun 27 '23 02:06 leonardoulloa21

I am actually getting the following below all of a sudden. Did something change?

CRITICAL:root:twint.get:User:Expecting value: line 1 column 1 (char 0)

corpuzdonn avatar Jul 02 '23 15:07 corpuzdonn