DeepSpeech icon indicating copy to clipboard operation
DeepSpeech copied to clipboard

Make pre trained model available on torrent

Open perilbrain opened this issue 5 years ago • 9 comments

I have been trying to download pre-trained model for last one month but not a single time I succeeded. Such big files are good candidates for torrent sharing because of two major reasons

  1. Allows resuming downloads.
  2. Takes away load from main server because of peer to peer sharing which is quiet economical for open source project.

Here are some problems I have been facing while downloading:

  1. wget : (781 KB/s) - Read error at byte 304165325/1916988031 (Connection reset by peer)
  2. axel : Too many redirects.
  3. Firefox: Downloading at the rate of 37KB/s, where usually most other downloads are 4-5MB/s.

If a torrent is shared while allowing seeding from the main server i.e. aws in this case, may be people will be able to download with less effort. Same goes for sharing data of voice initiative of mozilla but I guess it is different project to talk about.

One more issue was raised earlier regarding this #2151 which was closed blindly.

perilbrain avatar Jun 22 '19 09:06 perilbrain

Allows resuming downloads.

Technically nothing stops from supporting resuming download, it's been working fine downloading from Github for me.

If a torrent is shared while allowing seeding from the main server i.e. aws in this case

We don't have any tracker and we don't control AWS hosting, it's Github's hosting.

Same goes for sharing data of voice initiative of mozilla but I guess it is different project to talk about.

It's a different project, and the issue has already been raised, you can check on Discourse the discussion. What stopped from doing it for Common Voice, however, is not valid for us, so it might be possible.

One more issue was raised earlier regarding this #2151 which was closed blindly.

No, that issue was closed because there was no proper discussion / documentation.

lissyx avatar Jun 24 '19 10:06 lissyx

It seems to me that anyone could put up a torrent so it doesn't necessarily need to be done officially by Mozilla.

dabinat avatar Jun 25 '19 04:06 dabinat

@dabinat Yes.

However, the download statistics are used within Mozilla as a measure of this project's success. So, if the statistics are significantly curtailed as a result of using a torrent, management will think the project isn't healthy and make project cuts.

So if we use a torrent, then we'll need to do so in a manner that maintains some notion of download statistics. I think @reuben has some ideas in this regard.

kdavis-mozilla avatar Jun 25 '19 05:06 kdavis-mozilla

@lissyx

Technically nothing stops from supporting resuming download, it's been working fine downloading from Github for me.

Initially it was not resuming I don't know if wget was not considering -c flag or what, but it used to hang among some of the redirections. Anyway I was able to download in 9 continued trials with a bash scripts.

@kdavis-mozilla

the download statistics are used within Mozilla as a measure of this project's success

Of course we understand downloads could be a parameter for evaluation and overcoming that decision is challenging for developers, yet, a very large number of seeders and leechers shows real time popularity and patrons for the project, you just need to convince :).

@any-other-victim-of-issue

In case any one is having a problem downloading the script I am sharing a download script that might help:

#!/bin/bash
R=1
x=0
while [[ $R -ne 0 ]] ; do
    echo "$x Attempt. Last status: $R"
    if [[ ! -f "deepspeech-0.5.1-models.tar.gz" ]] ; then
        echo "No earlier file present. Exiting"
        exit 1
    fi
    wget --continue https://github.com/mozilla/DeepSpeech/releases/download/v0.5.1/deepspeech-0.5.1-models.tar.gz
    R=$?
    sleep .5
    x=$(( $x + 1 ))
done

change the url as it changes on release page otherwise you'll get restricted at v0.5.1.

perilbrain avatar Jun 25 '19 17:06 perilbrain

@kdavis-mozilla @reuben Should we do that for v0.6 when it's ready ?

lissyx avatar Nov 15 '19 18:11 lissyx

@lissyx I still worry about the

the download statistics are used within Mozilla as a measure of this project's success

issue.

kdavis-mozilla avatar Nov 17 '19 13:11 kdavis-mozilla

Last time we were looking into this I verified that webseed requests are tracked normally by GitHub as if it was a "normal" download. It can over-count sometimes if a client making several concurrent requests. Of course, if we publish the torrent file itself as part of the release we can also track how many times people are downloading it too.

reuben avatar Nov 17 '19 15:11 reuben

This might be something we want now? cc @reuben

lissyx avatar Sep 09 '20 08:09 lissyx

I guess we never stopped wanting it. But this is connected to tag-specific CI and heavily affected by the decisions made in #3317 so should probably wait for it.

Am 09.09.2020 um 10:18 schrieb lissyx [email protected]:  This might be something we want now? cc @reuben

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

reuben avatar Sep 09 '20 12:09 reuben