Feature Request - curatorial attributes
Is your feature request related to a problem? Please describe.
Encumbering attributes is expensive and complicated. https://github.com/ArctosDB/arctos/issues/3452 has lead to a more unified code table which carries a lot of information: https://arctos.database.museum/info/ctDocumentation.cfm?table=ctattribute_type. Adding a 'curatorial' (='for us only') flag to certain types of attributes, rather than encumbering individual assertions in individual records, might be a simplification for everyone.
Describe what you're trying to accomplish
Simplify and save come CPU if possible.
Describe the solution you'd like
First, discussion:
- Are there any general fatal flaws in this idea?
- Could this replace https://arctos.database.museum/info/ctDocumentation.cfm?table=ctencumbrance_action#mask_value, https://arctos.database.museum/info/ctDocumentation.cfm?table=ctencumbrance_action#mask_unformatted_measurements, https://arctos.database.museum/info/ctDocumentation.cfm?table=ctencumbrance_action#mask_nagpra_category, and any other record attribute involved encumbrances?
- Could this be extended to other types of attributes, such as https://arctos.database.museum/info/ctDocumentation.cfm?table=ctencumbrance_action#mask_part_attribute_location?
Describe alternatives you've considered
Do what we're doing.
Additional context
https://github.com/ArctosDB/arctos/issues/3536#issuecomment-2088807250 https://github.com/ArctosDB/arctos/discussions/6742 https://github.com/ArctosDB/arctos/discussions/6179
Priority
Somewhat high; the new format attribute table is also a simplification for everyone. I think it's much easier to understand, and it is much easier/cheaper for me to use. I'd like to do it for other attributes, but not until there's some stability in that model.
@AJLinn - this sounds like a great discussion for the Encumbrance committee!
I like this approach as there are often need to have a 'private' attribute -- so this flag would be available for all record attributes or are you suggesting we have a new attribute that is always private? Either way could work-- the former would be better for several reasons and I see little negative repercussions.
I was suggesting the types be flagged, similar to https://arctos.database.museum/info/ctDocumentation.cfm?table=ctagent_attribute_type (where eg https://arctos.database.museum/info/ctDocumentation.cfm?table=ctagent_attribute_type#correspondence is always 'us' and https://arctos.database.museum/info/ctDocumentation.cfm?table=ctagent_attribute_type#cultural_affiliation is always public)
A flag on individual attributes could be discussed, but I think that's most of the cost of encumbrances. (I'd still have to filter individual records, but it is a couple joins 'closer.')
One major drawback to any form of this might be a lack of transparency. Encumbrances require a contact person and some elevated rights. This potentially does not have that: especially if individual attributes can be flagged, maybe there's something sensitive and precious behind that and maybe some student just checked the wrong box, how would anyone possibly know?
ok this is mixing implementation with functional need a bit. Your suggestion is more like the second option where we have new attributes that are always private, which could work-- we can use the Determiner, date and remarks fields to note those encumbered metadata potentially.
The lack of transparency may be an issue since the public wont see that there's a private attribute or not, right? Let me see how the Issues meeting agenda shapes up for adding there or whenever the next encumbrance w.g. meeting is....
mixing implementation with functional
Yes, starting the discussion off with functional requirements would always be lovely!
use the Determiner...
I don't think that'll work, the determiner (and other metadata) is still important whether the info is public or not. You've determined whatever sensitive thing using whatever method on suchnsuch date, that's all critical to understanding the data.
We should possibly have an 'enteredby' capture on attributes (I've been meaning to file that issue for months, someone else is very welcome to grab it and run...) which would get at me entering your determination (and possibly explain typos and such), but still doesn't really paint the whole picture.
With encumbrances, we can clearly say "CuratorX doesn't want these data public because whatever reasons." (And ideally we might share that with eg https://dwc.tdwg.org/terms/#dwc:informationWithheld so that eg a trusted researcher could know that we do have data and who to ask about it, but I don't think we do.) I'm not sure losing that ability for a few specific administrative attributes would be much different, but losing it for any arbitrary thing somehow seems like it probably leads to scary and regrettable places.
(We could require 'encumbered_by....' metadata on each attribute, but I think that's probably WAY over in 'unusable' territory.)
Encumbrance committee likes the idea of flagging individual items, but would like to limit access - eg a student can update everything about an (attribute, part attribute, whatever) except the 'hide' checkbox, which requires (manage collection or whatever).
@dustymc to explore feasibility, CPU costs, complexity, temporarily going active development to do so.
Flagging individual items would very effectively prevent "load-as-encumbered" which is (more or less) not great for https://github.com/ArctosDB/internal/issues/332.
I am not sure that the complexity of our current encumbrance model supports scalability. https://github.com/ArctosDB/internal/issues/332 (and various other similar issues) might need some level of "custom UI," that might well require/be best addressed by a "deep" API which deals with various public and private information. I suspect that simple ways of withholding information (eg "this is structurally public, this private" vs. "nearly everything may sometimes be private, depending on descriptive data and conditional information several joins away") is going to have outsized impacts on the reality of providing such tools/services. We should reconsider the weight of simplicity, whatever form this might ultimately take.
Attribute encumbrances have become a cost which is increasingly difficult to pay.
I'm still in favor of adding a 'private' flag to the necessary attribute tables, which I believe would DRASTICALLY simplify everything for everyone (including the scripts which maintain the private cache). I think this (with an existing issue which already has a more-elegant solution) could be manipulated to provide complete coverage in one easy-to-understand way.
- mask collector, mask preparator - convert agent records to a private version of attribute 'verbatim agent'
- mask coordinates, mask year collected - https://github.com/ArctosDB/dev/issues/189
- mask NAGPRA category, mask unformatted measurements, mask value, mask specimen remarks - are or could readily be attributes
- mask original field number - https://github.com/ArctosDB/dev/issues/206 - this is probably broken, sensitive identifiers could easily be stored as encumbrances
- mask part attribute location - attribute (but not record, so this would need extended to part attributes or these data would need a new home)
- mask record ~~restrict usage~~ - no change necessary
- EDIT: restrict usage should move to permits, encumbrances should be reserved for 'does stuff in the UI' - see https://github.com/ArctosDB/dev/issues/122
See also: https://github.com/ArctosDB/dev/issues/73
I think the expense of encumbrances were involved in today's outage.
ctspec_part_att_att (https://github.com/ArctosDB/dev/issues/42) has been frustrating to deal with a few times lately, I need to rebuild part attributes code tables, and that suggests a relatively easy experiment: Can I add the public/private flag to https://arctos.database.museum/info/ctDocumentation.cfm?table=ctspecpart_attribute_type (which will be reborn with a new structure and possibly a new name) and set it to private for https://arctos.database.museum/info/ctDocumentation.cfm?table=ctspecpart_attribute_type#location?
Location is the part attribute that we mask on all of our records as an encumbrance, but I'd be happy to just have that as a private/public flag rather than encumbrance.
guid_prefix | c
---------------+--------
ACUNHC:Bird | 1
ALMNH:Bird | 6
ALMNH:EH | 1337
ALMNH:Geo | 23
ALMNH:Inv | 5800
ALMNH:Mamm | 3838
ALMNH:Paleo | 11562
ANSP:Host | 3729
ANSP:Para | 8020
ASNHC:Bird | 1239
ASNHC:Herp | 396
ASNHC:Mamm | 23041
BYU:Bird | 1
CHAS:Art | 5
CHAS:AV | 1366
CHAS:Bird | 15122
CHAS:Egg | 3364
CHAS:EH | 1052
CHAS:Ento | 24340
CHAS:Fish | 10
CHAS:Herb | 14428
CHAS:Herp | 20619
CHAS:Inv | 15475
CHAS:Mamm | 8399
CHAS:Teach | 3687
CSULB:Fish | 2289
DMNS:Egg | 1
JSNM:Egg | 810
JSNM:Herb | 1049
JSNM:Paleo | 2519
MSB:Arth | 1
MSB:Bird | 2
MSB:DGR | 2270
MSB:Mamm | 60
NHSM:Mamm | 212
NMMNH:Paleo | 956
PSM:Paleo | 10934
TCDGM:Mineral | 811
TCDGM:Paleo | 304
UAM:Arc | 370118
UAM:Art | 8872
UAM:EH | 28163
UCM:Bird | 11577
UCM:FossilEgg | 257
UCM:Herp | 408
UCM:Mamm | 5885
UMNH:Herp | 1
UNM:Geol | 3369
UNM:MET | 190
UTEP:Bird | 149
UTEP:ES | 3
UTEP:Herp | 92
UTEP:HerpOS | 2
UTEP:Inv | 2
UTEP:Mamm | 37
UTEPObs:Herp | 11
UWBM:PB | 62857
UWBM:PR | 8775
UWBM:VP | 93349
@jldunnum @catherpes @AdrienneRaniszewski @campmlc @leet1984 @ccwlobo @javanveldhuizen @sjshirar @mvzhuang @AJLinn @acdoll @KatherineLAnderson @DellaCHall @aklompma @ebraker @droberts49 @kderieg322079 @wellerjes @Nicole-Ridgwell-NMMNHS @rwilhoyt @byuherpetology @brandon-s-thompson @kmkocot @kat-sterner @babogan @ufarrell @WaigePilson
any major objections to making part attribute location always private/internal?
(And why are there container-using collections in this list???)
Csv please?
Csv
https://arctos.database.museum/search.cfm
Fine with me!
I'm fine with this.
A csv for all 4 MSB collections would be awfully nice, so I don't have to search each one individually . . .
https://arctos.database.museum/search.cfm?guid_prefix=MSB%3AArth%2CMSB%3ABird%2CMSB%3ADGR%2CMSB%3AFish%2CMSB%3AHerb%2CMSB%3AHerp%2CMSB%3AHost%2CMSB%3AInv%2CMSB%3AMamm%2CMSBObs%3AHerp%2CMSBObs%3AMamm%2CMSB%3APara&part_attribute=location
(Somehow I doubt you can make much use of what's in my CSV, but there it is...)
I'm okay with this
go for it
Fine by me!
Fine with me.
Fine with me.
Yes!
In reference to https://github.com/ArctosDB/arctos/issues/7792#issuecomment-2475114112 Looking these over, even though the collections are using object tracking, these samples are not in locations that are included in the object tracking system - e.g. in external research labs, in research freezers of unbarcoded samples, zoo material of oversize vials that don't fit into standard freezer boxes. It is fine to make these part attributes private, as long as students with manage specimens can still see the attribute.
Fine with me as well!
Good for me
(And why are there container-using collections in this list???)
and as I described in issue ArctosDB/arctos#8304, we use both barcodes and part attribute location due to not being 100% barcoded in all our objects. Also we prefer the functionality of seeing the part attribute location in the search results view. But yes, make it private!
Fine with me!
dev-->https://github.com/ArctosDB/dev/issues/118