gmusicapi icon indicating copy to clipboard operation
gmusicapi copied to clipboard

Reduce maximum memory usage while uploading files.

Open adam-uk opened this issue 9 years ago • 10 comments

Uploading large files depending on garbage collection could potentially use 2x file size worth of memory, as the entire file is read into memory twice at different points.

Justification: Free cloud based services and or other systems with limited resources (say 512MB) can run out of memory when uploading large files close to google's 300MB file limit.

adam-uk avatar Feb 26 '16 20:02 adam-uk

(for context, we talked about this over email briefly. I'm not 100% sure about the 2x memory usage number, but it seemed to match what we had seen.)

simon-weber avatar Feb 26 '16 20:02 simon-weber

Hi, just wondered if you'd had any more thoughts on this.

Thanks, Adam

adam-uk avatar Apr 05 '16 20:04 adam-uk

I think it's a good idea to address, but I haven't had much time for gmusicapi recently.

simon-weber avatar Apr 06 '16 01:04 simon-weber

What version of Python were you guys using when you found this? I couldn't seem replicate this on Python 3, but it did seem to happen on Python 2.

thebigmunch avatar Apr 29 '16 23:04 thebigmunch

So, I've tracked down the memory multiplication to the call to UploadFile. Will need to investigate more to find the exact cause and solution.

While not part of that specific problem, I've also made a change to the client ID generation to hash the file in chunks which should essentially remove the memory constraints for that part of uploading.

thebigmunch avatar Apr 30 '16 18:04 thebigmunch

Hi,

Incidentally I almost opened an issue with your upload script with it not deleting the file on success. I must have been really unlucky as I checked the change log and saw there was a small window where i must have grabbed the broken version. It's all working now.....anyhow...

Your question:

python --version Python 3.3.2

I also hacked around with the hashing part. Simon had linked to a stack overflow with an example. It may have helped but it didn't completely solve the problem. I had been been meaning to publish that, but I have no experience with github or much python either.

http://stackoverflow.com/questions/1131220/get-md5-hash-of-big-files-in-python/1131238#1131238

I think my change looked something like this, but I lost my actual change by accident.

gmusicapi/protocol/musicmanager.py#L158

    def get_track_clientid(filepath, blocksize=2**20):
        m = hashlib.md5()
        with open(filepath, 'rb') as f:
            while True:
                buf = f.read(blocksize)
                if not buf:
                    break
                m.update( buf )
        return base64.encodestring(m.digest())[:-3]

Where I got stuck was with

https://github.com/simon-weber/gmusicapi/blob/df84af852cbd3f59af9000dd9bcd19a20a24d479/gmusicapi/clients/musicmanager.py#L599

which is more tricky for me.

adam-uk avatar May 04 '16 21:05 adam-uk

No worries; thebigmunch got to that recently in 5fccad3f6b47235641f6341df1947c89f919a63f. More discussion in https://github.com/simon-weber/gmusicapi/pull/459.

simon-weber avatar May 04 '16 22:05 simon-weber

What I've found is that the multiplication lies within the dynamically built build_request method for the UploadFile call class. But I haven't gotten any further into it.

PS I believe that we may have to live with putting the full file into memory once where you linked but that should be the extent of it.

thebigmunch avatar May 04 '16 22:05 thebigmunch

So, I can confirm Python 3.3.2 exhibits this, but I didn't test any other Python 3.3 point releases. Python 2.7.11 (and likely any supported Python 2 version) actually quadruples memory usage. Python 3.4.0 and later are fine.

It really seems like this is caused by something in Python that was changed/fixed, possibly in the garbage collector. I can't really find a solution from gmusicapi's side other than suggesting people use Python 3.4+ if this causes them trouble.

thebigmunch avatar May 16 '16 20:05 thebigmunch

Wow, super weird. Thanks for digging into it.

simon-weber avatar May 16 '16 20:05 simon-weber