sample-sheet icon indicating copy to clipboard operation
sample-sheet copied to clipboard

Sample_ID validation

Open nathanweeks opened this issue 2 years ago • 0 comments

The Illumina Sequencing Sample Sheet Format Specifications document cited in the sample-sheet code: https://github.com/clintval/sample-sheet/blob/06d2566c0bf9a0f3b14856e97e4e6cea2827ca89/sample_sheet/init.py#L58-L62

explicitly mentions additional restrictions on Sample_ID column values:

The field for the Sample_ID column has special character restrictions as only alphanumeric (ASCII codes 48-57, 65-90, and 97-122), dash (ASCII code 45), and underscore (ASCII code 95) are permitted. The Sample_ID length is limited to 100 characters maximum.

The sample_sheet validation code currently allows some invalid Sample_ID values (e.g., containing +) that some tools (like bcl2fastq) reject. Could the sample_sheet validation code be enhanced to detect Sample_IDs that don't conform to the Illumina spec?

nathanweeks avatar Feb 17 '23 14:02 nathanweeks