SARS-CoV-2-Sequenzdaten_aus_Deutschland icon indicating copy to clipboard operation
SARS-CoV-2-Sequenzdaten_aus_Deutschland copied to clipboard

implausible Omicron classification

Open rgerhards opened this issue 2 years ago • 3 comments

I see a number of sequences from 2020 and early 2021 classified as Omicron. This does not look plausible to me. May this be related to https://github.com/robert-koch-institut/SARS-CoV-2-Sequenzdaten_aus_Deutschland/issues/9 and the result of a variant PCR?

Data in question:

date_draw IMS_ID lineage scorpio_call sequencing_lab_pc  1 sending_lab_pc seq_type  
2021-03-17 IMS-10013-CVDP-BD31B70A-5B53-4588-B22A-E40EE489E32... BA.1.1   4779 73035 ILLUMINA
2021-03-10 IMS-10013-CVDP-7B4A9C47-3B1B-4F3A-85BC-94B4DA5FEB2... BA.1.1   4779 95448 ILLUMINA
2021-01-04 IMS-10013-CVDP-B219DF48-6F4F-4A17-B19D-98DC45AF974... BA.1.1 Omicron (BA.1-like) 4779 4779 ILLUMINA
2021-04-02 IMS-10013-CVDP-4D759CD1-2209-41DE-980C-E98F38D54BA... BA.1.1   4779 28357 ILLUMINA
2021-03-11 IMS-10013-CVDP-333514E8-FCA9-49A4-BCE2-6F58419756B... BA.1.1   4779 95448 ILLUMINA
2021-03-25 IMS-10013-CVDP-EB74C98E-815A-445E-AB76-8C659BE07B3... BA.1.1   4779 28357 ILLUMINA
2021-03-03 IMS-10013-CVDP-035CC7B7-2FCA-4831-8089-3937D681718... BA.1.1   4779 66386 ILLUMINA
2020-12-26 IMS-10013-CVDP-D91BBF83-C15E-4280-825F-26E4B301A2F... BA.1 Omicron (BA.1-like) 4779 28357 ILLUMINA
2021-04-16 IMS-10013-CVDP-4EFD6A2A-4346-433F-8D2C-A2AEC01E1E0... BA.1.1   4779 86154 ILLUMINA
2021-03-10 IMS-10013-CVDP-6D3F28AF-44EA-4BA8-B0D6-308DAD2E4CC... BA.1.1   4779 95448 ILLUMINA
2021-03-24 IMS-10013-CVDP-5E19AFA5-5AC8-4812-AD7D-C1B6F7ABD87... BA.1.1   4779 86154 ILLUMINA
2021-03-05 IMS-10013-CVDP-5FD87A5E-FAEA-43C1-B41E-D5F0BF2E0F8... BA.1.1   4779 81737 ILLUMINA
2020-12-22 IMS-10013-CVDP-652AEF69-8797-4473-9730-40C8422356E... BA.1 Omicron (BA.1-like) 4779 28357 ILLUMINA
2021-05-03 IMS-10013-CVDP-D1C0DC48-97F7-483D-9248-05CCD4DCB36... BA.1.1   4779 4779 ILLUMINA
2021-03-04 IMS-10013-CVDP-4528BCA3-144F-47DB-BF9E-CCB3D373C74... BA.1.1   4779 1665 ILLUMINA
2021-02-19 IMS-10013-CVDP-F6D01735-2811-4A43-9656-F8AF4506AD0... BA.1.1   4779 81737 ILLUMINA
2021-03-10 IMS-10013-CVDP-9BE8CEF8-0042-48FE-B796-3475C6AA707... BA.1.1   4779 95448 ILLUMINA
2021-06-08 IMS-10013-CVDP-536E691D-7DA2-4D70-BE14-0C512D8DBB0... BA.1.1   4779 4779 ILLUMINA
2021-03-17 IMS-10013-CVDP-8D6464EE-0C90-48B8-8976-13714B74F79... BA.1.1   4779 86154 ILLUMINA
2021-03-11 IMS-10013-CVDP-7EA8A3B8-F7CF-4422-B96A-91B02ECA4CE... BA.1.1   4779 4779 ILLUMINA
2021-03-23 IMS-10013-CVDP-DABBDE37-F49C-4F25-B604-7DF183E3661... BA.1.1   4779 4779 ILLUMINA
2021-03-10 IMS-10013-CVDP-4FA094F7-E334-41F7-87F5-38A42C2F478... BA.1.1   4779 1665 ILLUMINA
2021-09-02 IMS-10013-CVDP-3445725E-9F15-4E2D-A4E4-F23949A8FEB... BA.1.1   4779 4779 ILLUMINA
2021-04-03 IMS-10004-CVDP-33332ED0-2EB6-42F6-9FDD-166D0C19CAD... BA.1.1   21502 21502 ILLUMINA
2021-01-01 IMS-10061-CVDP-D28E7308-BDB2-47C6-ABD9-A26778807F4... BA.1.1 Probable Omicron (BA.1-like) 30159 30159 ILLUMINA

rgerhards avatar Jan 31 '22 08:01 rgerhards

It's striking that almost all affected samples have been sequenced by the lab with ID 10013 / postal code 04779 (I was confused for a moment by the four-digit postal code in the table).

Can you also add the processing date? I checked it manually for the bottom three entries of the table:

date_draw PROCESSING_DATE IMS_ID lineage scorpio_call sequencing_lab_pc  1 sending_lab_pc seq_type  
2021-09-02 2021-09-20 IMS-10013-CVDP-3445725E-9F15-4E2D-A4E4-F23949A8FEB... BA.1.1   4779 4779 ILLUMINA  
2021-04-03 2021-04-14 IMS-10004-CVDP-33332ED0-2EB6-42F6-9FDD-166D0C19CAD... BA.1.1   21502 21502 ILLUMINA  
2021-01-01 2022-01-15 IMS-10061-CVDP-D28E7308-BDB2-47C6-ABD9-A26778807F4... BA.1.1 Probable Omicron (BA.1-like) 30159

For the last one, it maybe just a typo in the year of the date_draw. For the other ones, date_draw and PROCESSING_DATE seem plausible in relaton to each other.

lenaschimmel avatar Jan 31 '22 16:01 lenaschimmel

Indeed, processing date is interesting - is somebody analyzing old samples?

I have removed sending_lab to keep the table from becoming too wide. If useful, I can export the data set. And sorry for the postcode confusion - I have an integer column inside the database to preserve space and gain speed.

date_draw  2 processing_date IMS_ID lineage seq_type sequencing_lab_pc  1  
2020-12-22 2022-01-13 IMS-10013-CVDP-652AEF69-8797-4473-9730-40C8422356E... BA.1 ILLUMINA 4779
2020-12-26 2022-01-13 IMS-10013-CVDP-D91BBF83-C15E-4280-825F-26E4B301A2F... BA.1 ILLUMINA 4779
2021-01-04 2022-01-24 IMS-10013-CVDP-B219DF48-6F4F-4A17-B19D-98DC45AF974... BA.1.1 ILLUMINA 4779
2021-02-19 2021-03-08 IMS-10013-CVDP-F6D01735-2811-4A43-9656-F8AF4506AD0... BA.1.1 ILLUMINA 4779
2021-03-03 2021-03-22 IMS-10013-CVDP-035CC7B7-2FCA-4831-8089-3937D681718... BA.1.1 ILLUMINA 4779
2021-03-04 2021-03-22 IMS-10013-CVDP-4528BCA3-144F-47DB-BF9E-CCB3D373C74... BA.1.1 ILLUMINA 4779
2021-03-05 2021-03-22 IMS-10013-CVDP-5FD87A5E-FAEA-43C1-B41E-D5F0BF2E0F8... BA.1.1 ILLUMINA 4779
2021-03-10 2021-03-22 IMS-10013-CVDP-7B4A9C47-3B1B-4F3A-85BC-94B4DA5FEB2... BA.1.1 ILLUMINA 4779
2021-03-10 2021-03-22 IMS-10013-CVDP-9BE8CEF8-0042-48FE-B796-3475C6AA707... BA.1.1 ILLUMINA 4779
2021-03-10 2021-03-22 IMS-10013-CVDP-6D3F28AF-44EA-4BA8-B0D6-308DAD2E4CC... BA.1.1 ILLUMINA 4779
2021-03-10 2021-03-22 IMS-10013-CVDP-4FA094F7-E334-41F7-87F5-38A42C2F478... BA.1.1 ILLUMINA 4779
2021-03-11 2021-03-22 IMS-10013-CVDP-333514E8-FCA9-49A4-BCE2-6F58419756B... BA.1.1 ILLUMINA 4779
2021-03-11 2021-03-22 IMS-10013-CVDP-7EA8A3B8-F7CF-4422-B96A-91B02ECA4CE... BA.1.1 ILLUMINA 4779
2021-03-17 2021-03-29 IMS-10013-CVDP-BD31B70A-5B53-4588-B22A-E40EE489E32... BA.1.1 ILLUMINA 4779
2021-03-17 2021-03-25 IMS-10013-CVDP-8D6464EE-0C90-48B8-8976-13714B74F79... BA.1.1 ILLUMINA 4779
2021-03-23 2021-04-16 IMS-10013-CVDP-DABBDE37-F49C-4F25-B604-7DF183E3661... BA.1.1 ILLUMINA 4779
2021-03-24 2021-04-06 IMS-10013-CVDP-5E19AFA5-5AC8-4812-AD7D-C1B6F7ABD87... BA.1.1 ILLUMINA 4779
2021-03-25 2021-04-06 IMS-10013-CVDP-EB74C98E-815A-445E-AB76-8C659BE07B3... BA.1.1 ILLUMINA 4779
2021-04-02 2021-04-19 IMS-10013-CVDP-4D759CD1-2209-41DE-980C-E98F38D54BA... BA.1.1 ILLUMINA 4779
2021-04-16 2021-04-29 IMS-10013-CVDP-4EFD6A2A-4346-433F-8D2C-A2AEC01E1E0... BA.1.1 ILLUMINA 4779
2021-05-03 2021-05-17 IMS-10013-CVDP-D1C0DC48-97F7-483D-9248-05CCD4DCB36... BA.1.1 ILLUMINA 4779
2021-06-08 2021-06-21 IMS-10013-CVDP-536E691D-7DA2-4D70-BE14-0C512D8DBB0... BA.1.1 ILLUMINA 4779
2021-09-02 2021-09-20 IMS-10013-CVDP-3445725E-9F15-4E2D-A4E4-F23949A8FEB... BA.1.1 ILLUMINA 4779
2021-04-03 2021-04-14 IMS-10004-CVDP-33332ED0-2EB6-42F6-9FDD-166D0C19CAD... BA.1.1 ILLUMINA 21502
2021-01-01 2022-01-15 IMS-10061-CVDP-D28E7308-BDB2-47C6-ABD9-A26778807F4... BA.1.1 ILLUMINA 30159

rgerhards avatar Jan 31 '22 17:01 rgerhards

side note: SQL I use. Both CSVs are imported into separate tables as they are.

SELECT rki_sequenzen_meta.date_draw, processing_date, rki_sequenzen.IMS_ID, lineage, seq_type, sequencing_lab_pc FROM rki_sequenzen inner JOIN rki_sequenzen_meta on rki_sequenzen_meta.IMS_ID = rki_sequenzen.ims_id and rki_sequenzen_meta.date_draw <= "2021-11-01" where (lineage = 'B.1.1.529' or lineage like 'BA.%') and rki_sequenzen_meta.SEQ_REASON like 'N%' ORDER BY rki_sequenzen_meta.sequencing_lab_pc ASC, date_draw

rgerhards avatar Jan 31 '22 17:01 rgerhards