Problems reading stdin on Windows 11 Python 3.12
This issue was identified on https://github.com/peterjc/thapbi-pict/issues/627 where continuous integration tests using cutadapt to generate test inputs started to fail WITHOUT relevant code changes. i.e. Something changes in the AppVeyor Windows environment (unclear what) and/or the PyPI packages (nothing obvious).
I have since reduced this to a local test case under Windows 11, Python 3.12.4 (installed from python.org with the PATH option ticked), and cutadapt 4.9 (installed with pip install -U cutadapt):
C:\Users\xxx>python --version
Python 3.12.4
C:\Users\xxx>pip install -U cutadapt
Collecting cutadapt
Downloading cutadapt-4.9-cp312-cp312-win_amd64.whl.metadata (3.5 kB)
Collecting dnaio>=1.2.0 (from cutadapt)
Downloading dnaio-1.2.1-cp312-cp312-win_amd64.whl.metadata (3.6 kB)
Collecting xopen>=1.6.0 (from cutadapt)
Downloading xopen-2.0.2-py3-none-any.whl.metadata (15 kB)
Collecting isal>=1.6.1 (from xopen>=1.6.0->cutadapt)
Downloading isal-1.6.1-cp312-cp312-win_amd64.whl.metadata (10 kB)
Collecting zlib-ng>=0.4.1 (from xopen>=1.6.0->cutadapt)
Downloading zlib_ng-0.4.3-cp312-cp312-win_amd64.whl.metadata (6.9 kB)
Downloading cutadapt-4.9-cp312-cp312-win_amd64.whl (231 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 231.1/231.1 kB 939.6 kB/s eta 0:00:00
Downloading dnaio-1.2.1-cp312-cp312-win_amd64.whl (86 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86.2/86.2 kB 808.2 kB/s eta 0:00:00
Downloading xopen-2.0.2-py3-none-any.whl (17 kB)
Downloading isal-1.6.1-cp312-cp312-win_amd64.whl (201 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 201.5/201.5 kB 942.2 kB/s eta 0:00:00
Downloading zlib_ng-0.4.3-cp312-cp312-win_amd64.whl (88 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.5/88.5 kB 1.7 MB/s eta 0:00:00
Installing collected packages: zlib-ng, isal, xopen, dnaio, cutadapt
Successfully installed cutadapt-4.9 dnaio-1.2.1 isal-1.6.1 xopen-2.0.2 zlib-ng-0.4.3
C:\Users\xxx>cutadapt --version
4.9
This was then patched to include d9cf273b0b74ed82e1803daf766c914c37c90c34 for a clearer error message.
Working example using a filename and one of the cutadapt test files (the primer isn't really appropriate for this dataset):
C:\Users\xxx\cutadapt\tests\data>cutadapt --quiet -g GAAGGTGAAGTCGTAACAAGG 454.fa | find ">" /C
59
Restructured to read from stdin:
C:\Users\xxx\Projects\cutadapt\tests\data>type 454.fa | cutadapt --quiet -g GAAGGTGAAGTCGTAACAAGG - | find ">" /C
Input file format not recognized. The file starts with b'TGTA', but files in supported formats start with '>' (FASTA), '@' (FASTQ) or 'BAM'
0
Inserting additional logging suggests function detect_file_format is called twice, the first time works and says FASTA format. The second time it is part way though the file, and fails.
This reminds me of #774, but is something Windows specific it seems.
does it work with the latest xopen 1.x version? 2.0.0 was a massive refactoring that enabled all sorts of cool functionality while also using less code, but as a consequence there were unforeseen bugs.
I will try and check that next week (won't have access to a Windows machine over the weekend).
No change with xopen 1.7.0, 1.8.0, or 1.9.0 - and my debugging to stderr still shows the detect_file_format function being called twice.
I've now been able to reproduce the problem on Windows 10. I can also see where the function is called twice, will see what I can do.
Interesting, it works on Python 3.10, 3.11 and 3.12.0, but fails on 3.12.4. Bisecting points to commit https://github.com/python/cpython/commit/de347c02070f7b1e8a4810ece5e898b22b4070cd (part of 3.12.3).
I realize only now that this is the same issue as the one that we encountered in dnaio two days ago when trying to make a new release. This is now fixed by this PR: https://github.com/marcelm/dnaio/pull/148
I’ve also changed the dnaio CI so that the Windows tests run on all supported Python versions, which should help to catch something like this better.
I actually adjusted my code's test suite after reporting this not to use stdin/stdout so much with cutadapt - otherwise you might have had more bug reports from me ;)