cutadapt icon indicating copy to clipboard operation
cutadapt copied to clipboard

Problems reading stdin on Windows 11 Python 3.12

Open peterjc opened this issue 1 year ago • 5 comments

This issue was identified on https://github.com/peterjc/thapbi-pict/issues/627 where continuous integration tests using cutadapt to generate test inputs started to fail WITHOUT relevant code changes. i.e. Something changes in the AppVeyor Windows environment (unclear what) and/or the PyPI packages (nothing obvious).

I have since reduced this to a local test case under Windows 11, Python 3.12.4 (installed from python.org with the PATH option ticked), and cutadapt 4.9 (installed with pip install -U cutadapt):

C:\Users\xxx>python --version
Python 3.12.4

C:\Users\xxx>pip install -U cutadapt
Collecting cutadapt
  Downloading cutadapt-4.9-cp312-cp312-win_amd64.whl.metadata (3.5 kB)
Collecting dnaio>=1.2.0 (from cutadapt)
  Downloading dnaio-1.2.1-cp312-cp312-win_amd64.whl.metadata (3.6 kB)
Collecting xopen>=1.6.0 (from cutadapt)
  Downloading xopen-2.0.2-py3-none-any.whl.metadata (15 kB)
Collecting isal>=1.6.1 (from xopen>=1.6.0->cutadapt)
  Downloading isal-1.6.1-cp312-cp312-win_amd64.whl.metadata (10 kB)
Collecting zlib-ng>=0.4.1 (from xopen>=1.6.0->cutadapt)
  Downloading zlib_ng-0.4.3-cp312-cp312-win_amd64.whl.metadata (6.9 kB)
Downloading cutadapt-4.9-cp312-cp312-win_amd64.whl (231 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 231.1/231.1 kB 939.6 kB/s eta 0:00:00
Downloading dnaio-1.2.1-cp312-cp312-win_amd64.whl (86 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86.2/86.2 kB 808.2 kB/s eta 0:00:00
Downloading xopen-2.0.2-py3-none-any.whl (17 kB)
Downloading isal-1.6.1-cp312-cp312-win_amd64.whl (201 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 201.5/201.5 kB 942.2 kB/s eta 0:00:00
Downloading zlib_ng-0.4.3-cp312-cp312-win_amd64.whl (88 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.5/88.5 kB 1.7 MB/s eta 0:00:00
Installing collected packages: zlib-ng, isal, xopen, dnaio, cutadapt
Successfully installed cutadapt-4.9 dnaio-1.2.1 isal-1.6.1 xopen-2.0.2 zlib-ng-0.4.3

C:\Users\xxx>cutadapt --version
4.9

This was then patched to include d9cf273b0b74ed82e1803daf766c914c37c90c34 for a clearer error message.

Working example using a filename and one of the cutadapt test files (the primer isn't really appropriate for this dataset):

C:\Users\xxx\cutadapt\tests\data>cutadapt --quiet -g GAAGGTGAAGTCGTAACAAGG 454.fa | find ">" /C
59

Restructured to read from stdin:

C:\Users\xxx\Projects\cutadapt\tests\data>type 454.fa | cutadapt --quiet -g GAAGGTGAAGTCGTAACAAGG - | find ">" /C
Input file format not recognized. The file starts with b'TGTA', but files in supported formats start with '>' (FASTA), '@' (FASTQ) or 'BAM'
0

Inserting additional logging suggests function detect_file_format is called twice, the first time works and says FASTA format. The second time it is part way though the file, and fails.

This reminds me of #774, but is something Windows specific it seems.

peterjc avatar Jun 21 '24 02:06 peterjc

does it work with the latest xopen 1.x version? 2.0.0 was a massive refactoring that enabled all sorts of cool functionality while also using less code, but as a consequence there were unforeseen bugs.

rhpvorderman avatar Jun 21 '24 08:06 rhpvorderman

I will try and check that next week (won't have access to a Windows machine over the weekend).

peterjc avatar Jun 21 '24 14:06 peterjc

No change with xopen 1.7.0, 1.8.0, or 1.9.0 - and my debugging to stderr still shows the detect_file_format function being called twice.

peterjc avatar Jun 24 '24 05:06 peterjc

I've now been able to reproduce the problem on Windows 10. I can also see where the function is called twice, will see what I can do.

marcelm avatar Jun 24 '24 14:06 marcelm

Interesting, it works on Python 3.10, 3.11 and 3.12.0, but fails on 3.12.4. Bisecting points to commit https://github.com/python/cpython/commit/de347c02070f7b1e8a4810ece5e898b22b4070cd (part of 3.12.3).

marcelm avatar Jul 29 '24 23:07 marcelm

I realize only now that this is the same issue as the one that we encountered in dnaio two days ago when trying to make a new release. This is now fixed by this PR: https://github.com/marcelm/dnaio/pull/148

I’ve also changed the dnaio CI so that the Windows tests run on all supported Python versions, which should help to catch something like this better.

marcelm avatar Nov 13 '24 13:11 marcelm

I actually adjusted my code's test suite after reporting this not to use stdin/stdout so much with cutadapt - otherwise you might have had more bug reports from me ;)

peterjc avatar Nov 13 '24 13:11 peterjc