api icon indicating copy to clipboard operation
api copied to clipboard

UserWarning: Not all PushShift shards are active. Query results may be incomplete

Open bartboersma opened this issue 5 years ago • 2 comments

Hi,

This morning I got the following error message: UserWarning: Not all PushShift shards are active. Query results may be incomplete warnings.warn(shards_down_message) I made my extra small so the size of the query shouldn't be a problem. I've noticed that api.pushshift.io doesn't return any data for the past 5 hours (so setting after=5h doesn't return any data, but after=6h does).

Just curious what the reason for this warning is and what it means for me so I can try to circumvent it.

Last note, awesome work!

Regards,

Bart

bartboersma avatar Feb 10 '21 10:02 bartboersma

I am using PSAW as a wrapper and noticed that you can fetch the amount of shards by (see https://pypi.org/project/psaw/): api = PushshiftAPI() api.metadata_.get('shards') that returned the following result {'failed': 0, 'skipped': 0, 'successful': 2, 'total': 4}

So I am assuming I am getting this error because only two are successful out of the 4.

Is this something specific to me or is everyone facing this issue right now? Also, is this something that happens regularly?

bartboersma avatar Feb 10 '21 13:02 bartboersma

After some more investigation I found the following comment by @pushshift: If the successful shard count is less than the total shard count, what probably happened is that a node fell out of the cluster. This is usually always a temporary thing.

source: https://www.reddit.com/r/pushshift/comments/cqyq8t/update_all_indices_have_been_recoved/

bartboersma avatar Feb 10 '21 13:02 bartboersma