pepy icon indicating copy to clipboard operation
pepy copied to clipboard

Get badge without mirrors?

Open shadiakiki1986 opened this issue 5 years ago • 9 comments

Hey there. Awesome project. Is it possible to get a badge from pepy without the mirrors? For my project, the mirrors stats are much larger than the non-mirror ones because it's still a young project. I wouldn't want to be misleading with the badge on my README

References

https://pepy.tech/project/isitfit

https://pypistats.org/packages/isitfit

shadiakiki1986 avatar Oct 08 '19 11:10 shadiakiki1986

I don't plan to add this feature, maybe I will add a similar one to list the source of downloads. If you take a look at your downloads in BigQuery you can see the following results:

row details_installer_name downloads
1 Browser 122
2 pip 71
3 requests 96
4 null 48
5 bandersnatch 2886

As you can see here the bandersnatch mirror has 2866 downloads. It seems quite a lot, but the mirror can be installed locally link. So if I have a mirror locally and I install your packaging from the mirror, this download will not be took into account.

psincraian avatar Oct 09 '19 18:10 psincraian

What I normally do is filter for only pypi in my package query

On Wed, Oct 9, 2019, 21:01 Petru Rares Sincraian [email protected] wrote:

I don't plan to add this feature, maybe I will add a similar one to list the source of downloads. If you take a look at your downloads in BigQuery you can see the following results: row details_installer_name downloads 1 Browser 122 2 pip 71 3 requests 96 4 null 48 5 bandersnatch 2886

As you can see here the bandersnatch mirror has 2866 downloads. It seems quite a lot, but the mirror can be installed locally link https://bandersnatch.readthedocs.io/en/latest/mirror_configuration.html. So if I have a mirror locally and I install your packaging from the mirror, this download will not be took into account.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/psincraian/pepy/issues/164?email_source=notifications&email_token=ACAA5BA2BOI47RYZNY6L5V3QNYL6TA5CNFSM4I6QXUR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAYYZBI#issuecomment-540118149, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAA5BDCBW6GOC4JRULMEPTQNYL6TANCNFSM4I6QXURQ .

shadiakiki1986 avatar Oct 09 '19 19:10 shadiakiki1986

Filter for pip* (typo)

On Wed, Oct 9, 2019, 22:05 shadi akiki [email protected] wrote:

What I normally do is filter for only pypi in my package query

On Wed, Oct 9, 2019, 21:01 Petru Rares Sincraian [email protected] wrote:

I don't plan to add this feature, maybe I will add a similar one to list the source of downloads. If you take a look at your downloads in BigQuery you can see the following results: row details_installer_name downloads 1 Browser 122 2 pip 71 3 requests 96 4 null 48 5 bandersnatch 2886

As you can see here the bandersnatch mirror has 2866 downloads. It seems quite a lot, but the mirror can be installed locally link https://bandersnatch.readthedocs.io/en/latest/mirror_configuration.html. So if I have a mirror locally and I install your packaging from the mirror, this download will not be took into account.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/psincraian/pepy/issues/164?email_source=notifications&email_token=ACAA5BA2BOI47RYZNY6L5V3QNYL6TA5CNFSM4I6QXUR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAYYZBI#issuecomment-540118149, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAA5BDCBW6GOC4JRULMEPTQNYL6TANCNFSM4I6QXURQ .

shadiakiki1986 avatar Oct 09 '19 19:10 shadiakiki1986

For what it's worth, I can also chime in to confirm that my download stats on pepy.tech are much, much higher than they are (or were) on pypistats.org.

That's all I wanted to say. No need to reply to this post.

The remainder of this message is a discussion which is not directly relevant to this thread, but I wasn't sure if it was appropriate to start a new issue. Feel free to ignore.

Other possible reasons for high download counts

Estimating the number of users instead of download counts

Before I used pypi, when my software was hosted on my own web page, the majority of downloads came from the same few IP addresses. (For example, I remember that one IP address downloaded my software over 10000 times. This was back when it was legal to keep track of visitor IP addresses.) Is it possible to use BigQuery to estimate the number of unique users (by discarding downloads from the same IP)? (Forgive me. I know nothing about BigQuery.)

Excluding downloads with unknown python versions

When I used pypistats.org, it was able to show what version of python the users who downloaded my project were using (eg 2.7, 3.5, 3.7, etc...). This was interesting, but it's not essential. I only mention this here because it seemed that (even after excluding downloads from mirrors), the majority of downloads for my small project were from users whose python version is "unknown" and whose OS is also "unknown". Are these downloads legitimate? Should we exclude them?

Thanks for creating this service.

jewettaij avatar Aug 11 '20 22:08 jewettaij

I also think it would be useful to have the option to choose the type of stats! Have there been any updates on this or are there any plans to add this in the future?

laurahanu avatar Jan 29 '21 10:01 laurahanu

Hi @laurahanu, currently we are saving download stats without mirrors. Now we need to make changes to the API and to the frontend app :-)

psincraian avatar Feb 01 '21 20:02 psincraian

Hi @psincraian, thanks for the reply and good to hear! Looking forward!

laurahanu avatar Feb 01 '21 21:02 laurahanu

currently we are saving download stats without mirrors. Now we need to make changes to the API and to the frontend app :-)

@psincraian I'm also looking forward to that, and thanks PePy as a whole!

I just wanted to add some thoughts on this, hopefully not too off-topic. I know these are not trivial issues and I' m aware of the discussion on why PyPI doesn't include stats themselves. And I imagine these issues don't matter much for packages with a large number of downloads.

As you can see here the bandersnatch mirror has 2866 downloads. It seems quite a lot, but the mirror can be installed locally link. So if I have a mirror locally and I install your packaging from the mirror, this download will not be took into account.

Since the mirrors seem to download all files, they might inflate a lot the numbers for packages with few users but binary wheels for various Python versions and platforms. I believe the total without mirrors will help a lot in those cases.

For instance, using BigQuery directly* a few weeks ago, one of my packages had:

  • around 26k downloads without mirrors
  • around 15k downloads with pip installer and details
  • more than 265k downloads total (286k from PePy)

(*=I was using the old downloads* table for this, not file_downloads)

@jewettaij mentioned:

the majority of downloads for my small project were from users whose python version is "unknown" and whose OS is also "unknown". Are these downloads legitimate? Should we exclude them?

Besides those, which usually reflect that the fields are null in the BigQuery table, I noticed some other weird things. For example, I'm not sure how the "country_code" is filled in the BigQuery data, even when restricted to "pip" as the installer. For my niche package, I noticed from the data that country_code=US is disproportionally larger than everything else, so I wonder:

  • if the CDN infrastructure has any effect on that and possibly other fields
  • if the data from other countries is/was just lost more frequently
  • if maybe the current data is right(ish) and those download numbers are closer to the actual numbers

PMeira avatar Apr 10 '21 17:04 PMeira

Hi @psincraian, have there been any updates with the api or on the front end side? Otherwise, is there a timeline for when this would be included?

laurahanu avatar Jun 30 '21 16:06 laurahanu