New post processing for creating master files of model --> mc --> Profile --> CDS
First round for code review. Please be brutal. Will address bugs first and then efficiency and good practice recommendations after
Could we all hold off on reviewing this PR? It overlaps with my work on the depmap data manifest and Terra workspace generation.
Could we all hold off on reviewing this PR? It overlaps with my work on the depmap data manifest and Terra workspace generation.
Sure thing
Let us blacklist them or give them some other data type then. Coding for edge cases like this is not a good idea.
On Tue, Mar 25, 2025 at 11:52 AM Simone Zhang @.***> wrote:
@.**** commented on this pull request.
In model_to_omics_data_config.json https://github.com/broadinstitute/depmap_omics/pull/253#discussion_r2012421460 :
"exclusion_filters": {
"_comment": "Exclude rows where database column 'omics_sequencing.blacklist' has value 1(is True)","columns": ["blacklist"],"blacklist": [1],"filter_mode": "all"},"inclusion_filters": {"_comment": "Include rows where db column 'omics_sequencing.expected_type' is rna,wgs or wes","columns": ["expected_type"],"expected_type": ["rna", "wgs", "wes"],"filter_mode": "any"}},"omics_profile": {"_comment": "Gumbo table name 'omics_profile' is block key. Table name MUST exactly match database table name","exclusion_filters": {Here we should also exclude profiles where the profile_source is "taiga". Those are legacy profiles that we don't have bams for
— Reply to this email directly, view it on GitHub https://github.com/broadinstitute/depmap_omics/pull/253#pullrequestreview-2714310034, or unsubscribe https://github.com/notifications/unsubscribe-auth/BIKWZBMELK4RHZT6UPAKPIT2WF3UHAVCNFSM6AAAAABZU6JCQKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDOMJUGMYTAMBTGQ . You are receiving this because you authored the thread.Message ID: @.***>
When would we not process data that has been sequenced and given a release date?
On Tue, Mar 25, 2025 at 3:04 PM Simone Zhang @.***> wrote:
@.**** commented on this pull request.
In model_to_omics_data_config.nocomments.json https://github.com/broadinstitute/depmap_omics/pull/253#discussion_r2012768989 :
@@ -0,0 +1,173 @@ +{
- "table_filters": {
"omics_sequencing": {"exclusion_filters": {"columns": ["blacklist"],"blacklist": [1],"filter_mode": "all"},"inclusion_filters": {Another condition would be processed_sequence == True so we newly loaded/not processed samples don't get nominated as main sequencing ids
— Reply to this email directly, view it on GitHub https://github.com/broadinstitute/depmap_omics/pull/253#pullrequestreview-2714894270, or unsubscribe https://github.com/notifications/unsubscribe-auth/BIKWZBNFEIH76HZK7JZHNZL2WGSFBAVCNFSM6AAAAABZU6JCQKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDOMJUHA4TIMRXGA . You are receiving this because you authored the thread.Message ID: @.***>
That’s a good point. I’ll address that. On Tue, Mar 25, 2025 at 3:42 PM Simone Zhang @.***> wrote:
@.**** commented on this pull request.
In model_to_omics_data_config.json https://github.com/broadinstitute/depmap_omics/pull/253#discussion_r2012822277 :
"columns": ["blacklist"],
"blacklist": [1],"filter_mode": "all"},"inclusion_filters": {"_comment": "Include rows where db column 'omics_sequencing.expected_type' is rna,wgs or wes","columns": ["expected_type"],"expected_type": ["rna", "wgs", "wes"],"filter_mode": "any"}},"omics_profile": {"supercede_filter": {"_comment": "Include rows where atleast one of the columns 'internal_release_date', 'consortium_release_date' or 'public_release_date' is not None","columns": ["internal_release_date", "consortium_release_date", "public_release_date"],"internal_release_date": ["None"],I don't really know if it actually happens, but is release date != None enough? Like do ops ever assign a release date in the future, so we'll perhaps have to do release date < today?
— Reply to this email directly, view it on GitHub https://github.com/broadinstitute/depmap_omics/pull/253#pullrequestreview-2714991492, or unsubscribe https://github.com/notifications/unsubscribe-auth/BIKWZBM3YPOKG5GF4MKQN532WGWR5AVCNFSM6AAAAABZU6JCQKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDOMJUHE4TCNBZGI . You are receiving this because you authored the thread.Message ID: @.***>