staging-client-java SSDI Discriminators

@schusslern brought this up in another issue. Currently SSDI are associated with schemas. However there are times when the SSDIs don't apply to all cases in the schema. Sometimes they only relates to a specific site, hist or combination of the two.

Initially I was going to propose a SSDI-specific field that points to a table definition for inclusion. However I think we can add a more generic field since this could be used for any field in the future. So I propose adding inclusion_table to the input entity.

{
	"key": "er",
	"name": "ER Summary",
	"naaccr_item": 3827,
	"naaccr_xml_id": "estrogenReceptorSummary",
	"default": "9",
	"table": "er_summary_44166",
	"used_for_staging": true,
	"inclusion_table": "matching_table_id",
	"metadata": [
		{ "name": "COC_REQUIRED" },
		{ "name": "NPCR_REQUIRED" },
		{ "name": "SSDI" },
		{ "name": "CCCR_REQUIRED" },
		{ "name": "SEER_REQUIRED" }
	]
}

I don't think we need to make any changes to the staging process. This field will just be ignored during staging. It is something to be used by people developing front-ends for data collection.

To determine whether a field should be included, The existing method would work like this:

Staging staging = Staging.getInstance(EodDataProvider.getInstance(EodVersion.LATEST));

Map<String, String> inputs = new HashMap<>();
inputs.put("site", "C501");
inputs.put("hist", "8030");

StagingSchema schema = staging.getSchema("breast");
for (StagingSchemaInput input : schema.getInputs()) {
    if (input.getInclusionTable() == null || staging.findMatchingTableRow(input.getInclusionTable(), inputs) != null) {
        // a match was found so the SSDI should be included
    }
    else {
        // no match so do not collect the SSDI
    }
}

We could add a utility method on StagingSchemaInput called getInclusion(Map<String, String> inputs) but it won't be much simpler than what we have above.

Dec 10 '24 21:12 ctmay4

@schusslern when is this needed? I assume it is not a high priority.

Dec 10 '24 21:12 ctmay4

Is there any way to wrap that in a utility call somewhere? Could we have a method on the Schema called “getIncludedInputs(inputs)” to hide that low-level call to "findMatchingTableRow"? That would make the library a bit more user-friendly I think. Not a big deal either way.

Dec 10 '24 21:12 depryf

That would be trivial to add.

Dec 10 '24 21:12 ctmay4

as i said in Squish, i don't want it done yet - i want to know that it can be done and what can be done

for example, yes we can have multiple fields involved in the inclusion but they must all be input fields; AJCC ID isn't allowed
so i think we are good here. YES, it can be done; YES you can have multiple fields and lists or ranges; NO you can't use AJCC ID, you have to use input fields and not derived fields.

and an estimate of how long it would take to do

i think we should include the API and library changes, the Helios and RSA changes, and the DMS and Abs changes. it would be nice to have the DMS/Abs separately so we have some feel for how challenging it will be for vendors (tho will likely be at least 2-3 times longer for any other groups to deal with it)

Dec 10 '24 21:12 schusslern

The library and API changes are a few hours. Pretty straightforward; just need to write some unit tests.

As for DMS/Abs you will need to talk with Mike D. and Katherine. I don't think it is much work but they will have to give the final answer.

One other question. Not too long ago we added start and end years to the metadata field. If we add a getIncludedInputs(Map<String, String> inputs) method, should that year be looked at also? We already have logic in DMS for this. Keep in mind we would only care about the SEER_REQUIRED metadata so it seems odd to include it in the method logic. However if we don't we will have to look in multiple places to decide whether the collect a field.

Dec 11 '24 13:12 ctmay4

Good question. I am not sure. I think we would need to ask Mike and Katherine. I know SEER*Abs doesn't just look at SEER_REQUIRED, it also look at the other (the COC for example) to determine whether an SSDI is shown or not. If the method doesn't do exactly what Abs/DMS needs, then developers won't use it. So maybe Katherine and Mike should look at the code and make a recommendation as to what would be useful. Or we just start without it. I am not sure.

Dec 11 '24 13:12 depryf

I recommend we start without it. They can look in two places for now.

Dec 11 '24 13:12 ctmay4

SEER*Abs checks the metadata tags, the year ranges, and the "used_for_staging" field. It sounds like this would just be another check to add in, and I don't think that would be a lot of work.

Dec 11 '24 14:12 kirbykn

SEER*Abs checks the metadata tags, the year ranges, and the "used_for_staging" field. It sounds like this would just be another check to add in, and I don't think that would be a lot of work.

Thanks. And the same thing would go for DMS.

Dec 11 '24 14:12 ctmay4

yes, you would have to check both A) is the SSDI in use for the ydx: ydx within the year range associated with the SSDI overall and with the standard setters you care about B) does the SSDI apply to this specific cancer: if there are any site, hist, beh, etc limitations on the SSDI

Dec 11 '24 14:12 schusslern

so about a day to get the api/library set up, a day each for SEERAbs/SEERDMS to be set up and tested; and probably a day each to get Helios and SEER*RSA set up. and those should be generous estimates for most of the things but leaves space to run into a hiccup or two. i think that's fine.

Dec 11 '24 14:12 schusslern

I think SEER*DMS will take much more than a day. You guys are only thinking about the GUI and whether fields need to be displayed or not. But there are other aspects. For example, we have auto-cons for SSDIs, and those rules need to know if an SSDI is "applicable". That's currently done based no the schema and years. But it will need to be changed to be also based on these new limitations. We have a complex initialized data structure that provides the schema and start/end for every SSDI; that will need to be changed to somehow incorporate this new feature. Another complexity is that those rules have documentation, which is is not based on a given site/hist (it's static, like this SSDI is only applicable to those schemas, and only for start/end years. I am not sure if we will want this new information to also be displayed in the help.

At the end, maybe none of that will be a big problem. But I do think a day to incorporate all those changes in SEER*MDS feels too low IMO.

Dec 11 '24 15:12 depryf

do you think 3 days? 5? more? would it be worth discussing more efficient/streamlined ways to determine which set of SSDIs apply to a cancer so that all parts of DMS can call the same thing? sounds like auto-con is doing the same things that the interface has to do, but is doing them entirely separately - i just wondered if there was a better way

Dec 11 '24 15:12 schusslern

It's really difficult to streamline everything in DMS. The auto-cons does its thing, auto-build does its own (it's an old beast difficult to update), and the front-end does its own thing. And some polishers might also do their own thing.

I don't know how realistic it would be to try to reconcile all of that. We had (have?) big plan to have start/end year for every data items in the database, and that would be used by every module. But that already doesn't work for the SSDI because they are also schema-dependent (and now also input-dependent).

Anyway, I would say 5 days for SEER*DMS. It's really difficult to estimate, I am just not sure what kind of issues we will run into (but I know there will be many more changes than just the GUI). We can go with 5 days for now.

Dec 11 '24 15:12 depryf

Ok, sounds good. @schusslern this is on hold until you say its ready to be worked on.

Dec 11 '24 16:12 ctmay4

This issue is stale because it has been open for 30 days with no activity.

May 24 '25 02:05 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale.

Jun 07 '25 02:06 github-actions[bot]