4cat
4cat copied to clipboard
Telegram datasets return fewer results than ground truth for some accounts
The Telegram channel scraping function returns far fewer posts than the ground truth for some accounts, but all results for other accounts. One example where a hundreds of posts are missing is t.me/rusembjp for the period between 1 January 2022 and 16 June 2022. The missing data can be directly observed from the ascending post IDs.
The issue can be resolved by splitting up the data in many small chunks, which however is extremely time-consuming at scale. Thank you very much for your support!
Could you provide a link to the dataset in question? It may also be useful to check the logs and see if there were any issues:
Thank you!