4cat icon indicating copy to clipboard operation
4cat copied to clipboard

Telegram datasets return fewer results than ground truth for some accounts

Open schliebs opened this issue 3 years ago • 1 comments

The Telegram channel scraping function returns far fewer posts than the ground truth for some accounts, but all results for other accounts. One example where a hundreds of posts are missing is t.me/rusembjp for the period between 1 January 2022 and 16 June 2022. The missing data can be directly observed from the ascending post IDs.

The issue can be resolved by splitting up the data in many small chunks, which however is extremely time-consuming at scale. Thank you very much for your support!

schliebs avatar Jul 14 '22 22:07 schliebs

Could you provide a link to the dataset in question? It may also be useful to check the logs and see if there were any issues: image Thank you!

dale-wahl avatar Jul 25 '22 09:07 dale-wahl