Radim Řehůřek
Radim Řehůřek
@pminervini almost certainly; let us know if you find something 8)
@pminervini GraphiChi sounds great, thanks for the link! Also check out gensim for fast SVD (gensim targets topic modelling though, not collab filtering).
I don't think so – the README links work.
@kmike this version seems to work – what's your plan to merge & release? The wheels (incl Windows) are not blocking for us – installing from source works fine once...
@cadnce could we squeeze this into the next release, v6.3.0? We're planning to release soon (~in a week or so).
@iliadmitriev this repo seems no longer maintained… would you like to start a fork?
@pauldmccarthy also bitten by this… is it enough if I manually call `seek(tell())` after each read, or is the workaround more involved? **EDIT**: OK I tried the above and it...
Sure, thanks for looking into this. I created this minimal example: ```python import io import tarfile import random from string import ascii_letters from indexed_gzip import IndexedGzipFile def generate_tgz(path, num_files=10000, file_size=5*1024):...
My running hypothesis is that it's somehow related to input buffering – either inside tarfile, or igzip's buffer (set to 4 MB above).
The problem is seek points are not being created as the file is being read (which is why I'm hijacking this particular issue, I thought it's related). Instead, there are...