sequencescape icon indicating copy to clipboard operation
sequencescape copied to clipboard

DPL-554 SS New tag sets for Chromium Library Plate Manifest template

Open SujitDey2022 opened this issue 2 years ago • 2 comments

Title: Action to mitigate Tag issues will reduce the number of Sequencing Runs placed on hold, reducing overall sample TAT by days and reduce resources needed for corrective actions (correctly undertaken by SSRs, Data QC and NPG)

Description: There are two types of Manifests used by DNAP to upload library data into LIMS. Currently, the version used by the Cellgen Faculty contains the tags which must be added as the actual sequence (eg i7 CACCGCACCA and i5 GACTGTCAAT) Example can be downloaded here: https://sequencescape.psd.sanger.ac.uk/sdb/sample_manifests/20146

If Cellgen were able to use the Chromium Library Plate Manifest version which is tag plate based, then typos and incorrect reverse complementation would be mitigated. This would result in the correct tags being uploaded to lims, removing the need for the QC team putting runs on hold, SSRs contacting customers for the corrected tag sequences, re-uploading to SS and then requesting NPG to re-run the flowcell analysis.

The following tag plates can be added to the Chromium Library Plate Manifest template in SS: Single Index Kit N, Set A 1000212 Dual Index Kit TT, Set A 1000215 Dual Index Kit TN, Set A 1000250 Dual Index Kit TS, Set A 1000251

Primary contacts for this story: Richard C,

Who is the nominated tester for UAT Richard C,

Acceptance criteria To be considered successful the solution must allow:

Addition of the following into the Chromium Library Plate Manifest template for manifest creation Single Index Kit N, Set A 1000212 Dual Index Kit TT, Set A 1000215 Dual Index Kit TN, Set A 1000250 Dual Index Kit TS, Set A 1000251

For the manifest to be successfully uploaded into SS

This story is blocked by the following dependencies: • n/a

SujitDey2022 avatar Nov 02 '22 11:11 SujitDey2022

@Skrich1999 please can you confirm if this story is still relevant and needs to be planned into the development backlog? Thanks,

SujitDey2022 avatar Jun 27 '23 12:06 SujitDey2022

@Skrich1999 to review and update the user story.

SujitDey2022 avatar Mar 26 '24 15:03 SujitDey2022

@SujitDey2022 I have spoken with the SUs and the data is current. I have add Conor Parks as a CellGen contact and added in the google link to the data set. This story can be progressed next week, Thanks.

Skrich1999 avatar Apr 17 '24 15:04 Skrich1999

The two library plate manifest types mentioned: Screenshot 2024-04-26 at 08 52 38

Empty 'Library Plate' manifest, showing 'i7 TAG SEQUENCE' and 'i5 TAG SEQUENCE' fields: Screenshot 2024-04-26 at 08 53 49

Empty 'Chromium Library Plate' manifest, showing 'CHROMIUM TAG GROUP' and 'CHROMIUM TAG WELL' fields: Screenshot 2024-04-26 at 08 54 47

Tag groups and tags data model: Screenshot 2024-04-26 at 09 20 36

KatyTaylor avatar Apr 26 '24 08:04 KatyTaylor

I think all that's needed here is to insert some data into the 'tag group' and 'tag' tables in Sequencescape.

Four tag groups are needed - names listed out in the description, and in the names of the separate tabs in the Google Sheet. The adapter_type_id should link to the record in the tag_group_adapter_types table called 'Chromium'.

Under each tag group, many 'tags' should be created. The 'oligo' field should contain the DNA sequence (e.g. 'GAGGAGAGAG') found in the spreadsheet.

The map_id field represents the order of the tag in the group, or where it is on the tag plates - I'm currently not sure how to work this out from the spreadsheet - needs further discussion. I think from looking at the code that each well on the tag plate contains 4 tags - also not sure how this relates to the spreadsheet.

KatyTaylor avatar Apr 26 '24 11:04 KatyTaylor

It's not clear how the spreadsheet relates to the tag groups to me either. The 'Single' group appears to have 4 columns of oligo sequences. The Dual ones have 2 variants a and b of the combinations of two oligos. The existing Chromium tag groups in Production are generally 96 tags with map ids 1-96. The tagging screen in Limber allows selection of 2 tag groups (i5 and i7). An individual well can therefore have 1 or 2 tags in it. There are also tag_layout_templates, these are combinations of up to 2 tag groups along with the rules to lay them out on the plate (by map ids) e.g. by columns and wells of plate So how you can get to 4 tags per well I don't understand. Or how you'd have only 4 tag groups from those spreadsheet tabs. Sounds like a chat with Conor is the first step.

andrewsparkes avatar May 07 '24 16:05 andrewsparkes

@andrewsparkes @KatyTaylor, @Skrich1999 has reached out to Conor and we should be hearing back from him, let me follow up on this and get back.

SujitDey2022 avatar May 08 '24 03:05 SujitDey2022

Good morning, @andrewsparkes and @KatyTaylor! I've noticed that Conor is out of office until the 13th. Would it be a good idea to arrange a meeting with the relevant faculty members to address and resolve the questions that have been raised?

Skrich1999 avatar May 08 '24 08:05 Skrich1999

Hello @andrewsparkes and @KatyTaylor, I've just had a discussion with the Faculty group. They suggest it's best to hold off until Conor returns on the 13th. Additionally, I've delved into understanding set A and set B. It seems this pertains more to NPG than SS, but I could be mistaken.

From what I gather, only set A should be utilized. Unfortunately, I'm unable to modify the Google document to gray out set B (which is essentially the reverse of set A).

This reversal has introduced a new acceptance criterion: Tags must function effectively across all short read platforms and be appropriately reverse complemented by NPG when necessary.

I hope this clarifies things.

Skrich1999 avatar May 08 '24 13:05 Skrich1999

Tag sets A and B (for use when 10x dont and a LIMS system able to reverse competent tags (like NPG who automatically do this for us) : https://kb.10xgenomics.com/hc/en-us/articles/360056364852-Should-I-select-Workflow-A-or-Workflow-B-for-the-i5-index-sequence

Skrich1999 avatar May 08 '24 13:05 Skrich1999

On hold till Conner returns on 13th.

andrewsparkes avatar May 08 '24 14:05 andrewsparkes

In the Sequencescape code the manifests use 2 specialised columns:

  • chromium_tag_group
  • chromium_tag_well

(see:

config/sample_manifest_excel/manifest_types.yml
config/sample_manifest_excel/columns.yml
app/sequencescape_excel/sequencescape_excel/specialised_field/chromium_tag_group.rb
app/sequencescape_excel/sequencescape_excel/specialised_field/chromium_tag_well.rb

)

The chomium_tag_well class code takes the well location entered in the manifest (e.g. A1) and translates that to fetch 4 sequential tags from the corresponding chromium_tag_group (e.g. map_id indexes 1,2,3 and 4) to give the 4 tags per well.

So that would suggest a single tag group is made that contains 4 x the usual number of oligo sequences (i.e. 384 for use with 96-well tag plates).

So we have to check, for each of the 4 tabs referenced in Conors file, whether those tag groups are already created in the Sequencescape database or we need to make them. And if we make them we have to be VERY careful to get the map indexes and oligo sequences correct.

I think, for the SINGLE tab where we have 4 columns of oligos per row, we likely need a 384 oligo tag group, to be used in sets of 4 map_ids in a 4:1 relationship with the chromium_tag_well in the manifest.

Whereas for the 3 x DUAL tabs, these are likely standard 96 oligo tag groups with a 1:1 relationship to chromium_tag_well in the manifest.

andrewsparkes avatar May 23 '24 13:05 andrewsparkes

Looks like some of these are actually in Sequencescape already, under different names. I haven't carefully checked every single tag, just a selection.

Tag groups can be looked up by name here - https://training.sequencescape.psd.sanger.ac.uk/tag_groups

Name in story Name in SS
Single Index Kit N, Set A 1000212 Chromium single cell
Dual Index Kit TT, Set A 1000215 (workflow A) 10X_Plate TT Set A i7 and 10X_Plate TT Set A i5
Dual Index Kit TT, Set A 1000215 (workflow B) Not present
Dual Index Kit TN, Set A 1000250 (workflow A) REDUNDANT - Dual Index Kit TN Set A 10Xgenomics i5 (a), Dual Index Kit TN Set A 10Xgenomics i5_a Column Wise (same oligos but different order, column-wise), Dual Index TN Set A 10Xgenomics ColumnWise also related
Dual Index Kit TN, Set A 1000250 (workflow B) Not present
Dual Index Kit TS, Set A 1000251 (workflow A) FFPEvisium_i5, FFPEvisium_i7 (same oligos but different order, column-wise)
Dual Index Kit TS, Set A 1000251 (workflow B) Not present

KatyTaylor avatar May 23 '24 13:05 KatyTaylor

Notes for developers

(apologies for the wordy brain dump - I might clean it up later!)

Single index tag groups:

  • Single Index Kit N, Set A 1000212

In the spreadsheet (and I assume reality), this is a 96-well plate with 4 oligos listed per well. In SS, this is represented by one single tag group with 384 tags, all with unique 'map ids'. In the manifest, 'Chromium tag well' column, the SSR can fill out the correct well description e.g. 'A1'. When uploaded, SS will pull the relevant tags out of the tag group. e.g.

Well in spreadsheet / reality Translates to tag 'map id'
A1 [1, 2, 3, 4]
B1 [5, 6, 7, 8]
C1 [9, 10, 11, 12]
...etc.

See chromium_tag_well.rb for how this is achieved.

Looks like this tag group is already in the db (see previous comment), it might just need to be renamed for clarity.

Dual index tag groups:

  • Dual Index Kit TT, Set A 1000215
  • Dual Index Kit TN, Set A 1000250
  • Dual Index Kit TS, Set A 1000251

In the spreadsheet, there are 3 columns for each well. An i7 oligo, an i5 oligo for workflow A and an i5 oligo for workflow B. In SS, this will be represented as 3 tag groups - for i7, i5 (workflow A) and i5 (workflow B). The existing tag groups 10X_Plate TT Set A i7 and 10X_Plate TT Set A i5 are examples of the i7 and i5 (workflow A) columns respectively.

Tag groups are used in (at least) two places. These i7 and i5 tag groups can be selected on the Limber tagging screen - when a tag plate is actually being used in a pipeline. As described in this story, tags are also included in library manifests, for when customers submit ready-tagged libraries to SeqOps and have to tell them what tags were used in each well. The library plate manifest has two columns, 'i7' and 'i5', where the customer can specify the oligo by typing out the sequence. The Chromium library plate manifest allows the customer to specify tag group and tag plate well instead, so as to avoid typos. It looks to me like there is no manifest that currently supports specifying dual index tag groups. The Chromium library plate manifest allows the user to select a tag group that is not of the 4-per-well type. I tested selecting '10X_Plate TT Set A i7' and it broke on upload - because it tried to allocate 4 per well and then ran out of tags to allocate when it had used up all 96.

So for this story, we can insert the relevant tag groups into the database, but in order to be able to use the dual index tag groups in the manifests, we'd have to make new columns to support this - you'd need two drop downs 'i7 tag group' and 'i5 tag group'.

Ideally, we would also amend the existing Chromium library plate manifest to only display appropriate tag groups in the drop down - ones that have 4 tags per well.

Questions:

  • Why are there some existing tag groups where the tags are ordered differently to in the spreadsheet, 'column-wise' (see table in above comment). Tarryn was aware of this and mentioned it in the meeting, although wasn't sure of the reason.

--> In SS, the tags in a tag group have an 'index' (see https://training.sequencescape.psd.sanger.ac.uk/tag_groups/251, for instance). The index does not really represent a well, just the order of the tags in the tag group (it does NOT link to the maps table in SS). Normal 96-well tag groups like the linked one expect their tags to be in column order (A1, B1 etc.) - I think - hence the order in the list in Sequencescape (linked above) is different to the order in the 10x spreadsheet (linked in the story description). Manifests that allow users to specify tag group and well must make an assumption about the mapping between 'index' and well location on the tag plate. This mapping is different for different tag groups - for a different example, see the single index tag groups described above.

KatyTaylor avatar May 23 '24 13:05 KatyTaylor

re: Ideally, we would also amend the existing Chromium library plate manifest to only display appropriate tag groups in the drop down - ones that have 4 tags per well.

Background: The manifest has a column called chromium_tag_group. This references a range called chromium_tag_groups. This selects tag groups based on a scope called chromium. This filters for tag_groups that have an adapter_type of 'chromium' (see scope in app/models/tag_group.rb).
So the manifest is trying to limit what's in that dropdown to chromium tag groups.

That doesn't seem to be specific enough given the example above for '10X_Plate TT Set A i7' (which does have the chromium adapter_type but has 96 tags rather than 384).

andrewsparkes avatar Jun 17 '24 15:06 andrewsparkes

kt17, as28 and ay6 have talked about transforming data from the Google Doc and creating default records in Sequencescape.

yoldas avatar Jun 20 '24 16:06 yoldas

Intermediate files for DPL-554 is at the commit f04b929aef53509efa41a7f5156efdf5d191840a

yoldas avatar Jun 25 '24 14:06 yoldas

Dual index tag groups have been now moved to the WIP flagged file config/default_records/tag_groups/004_chromium_dual_index.wip.yml

yoldas avatar Jul 02 '24 12:07 yoldas