arctos icon indicating copy to clipboard operation
arctos copied to clipboard

Feature Request - curatorial attributes

Open dustymc opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe.

Encumbering attributes is expensive and complicated. https://github.com/ArctosDB/arctos/issues/3452 has lead to a more unified code table which carries a lot of information: https://arctos.database.museum/info/ctDocumentation.cfm?table=ctattribute_type. Adding a 'curatorial' (='for us only') flag to certain types of attributes, rather than encumbering individual assertions in individual records, might be a simplification for everyone.

Describe what you're trying to accomplish

Simplify and save come CPU if possible.

Describe the solution you'd like

First, discussion:

  • Are there any general fatal flaws in this idea?
  • Could this replace https://arctos.database.museum/info/ctDocumentation.cfm?table=ctencumbrance_action#mask_value, https://arctos.database.museum/info/ctDocumentation.cfm?table=ctencumbrance_action#mask_unformatted_measurements, https://arctos.database.museum/info/ctDocumentation.cfm?table=ctencumbrance_action#mask_nagpra_category, and any other record attribute involved encumbrances?
  • Could this be extended to other types of attributes, such as https://arctos.database.museum/info/ctDocumentation.cfm?table=ctencumbrance_action#mask_part_attribute_location?

Describe alternatives you've considered

Do what we're doing.

Additional context

https://github.com/ArctosDB/arctos/issues/3536#issuecomment-2088807250 https://github.com/ArctosDB/arctos/discussions/6742 https://github.com/ArctosDB/arctos/discussions/6179

Priority

Somewhat high; the new format attribute table is also a simplification for everyone. I think it's much easier to understand, and it is much easier/cheaper for me to use. I'd like to do it for other attributes, but not until there's some stability in that model.

dustymc avatar May 17 '24 15:05 dustymc

@AJLinn - this sounds like a great discussion for the Encumbrance committee!

ewommack avatar May 17 '24 20:05 ewommack

I like this approach as there are often need to have a 'private' attribute -- so this flag would be available for all record attributes or are you suggesting we have a new attribute that is always private? Either way could work-- the former would be better for several reasons and I see little negative repercussions.

mkoo avatar May 20 '24 19:05 mkoo

I was suggesting the types be flagged, similar to https://arctos.database.museum/info/ctDocumentation.cfm?table=ctagent_attribute_type (where eg https://arctos.database.museum/info/ctDocumentation.cfm?table=ctagent_attribute_type#correspondence is always 'us' and https://arctos.database.museum/info/ctDocumentation.cfm?table=ctagent_attribute_type#cultural_affiliation is always public)

A flag on individual attributes could be discussed, but I think that's most of the cost of encumbrances. (I'd still have to filter individual records, but it is a couple joins 'closer.')

One major drawback to any form of this might be a lack of transparency. Encumbrances require a contact person and some elevated rights. This potentially does not have that: especially if individual attributes can be flagged, maybe there's something sensitive and precious behind that and maybe some student just checked the wrong box, how would anyone possibly know?

dustymc avatar May 20 '24 20:05 dustymc

ok this is mixing implementation with functional need a bit. Your suggestion is more like the second option where we have new attributes that are always private, which could work-- we can use the Determiner, date and remarks fields to note those encumbered metadata potentially.

The lack of transparency may be an issue since the public wont see that there's a private attribute or not, right? Let me see how the Issues meeting agenda shapes up for adding there or whenever the next encumbrance w.g. meeting is....

mkoo avatar May 20 '24 20:05 mkoo

mixing implementation with functional

Yes, starting the discussion off with functional requirements would always be lovely!

use the Determiner...

I don't think that'll work, the determiner (and other metadata) is still important whether the info is public or not. You've determined whatever sensitive thing using whatever method on suchnsuch date, that's all critical to understanding the data.

We should possibly have an 'enteredby' capture on attributes (I've been meaning to file that issue for months, someone else is very welcome to grab it and run...) which would get at me entering your determination (and possibly explain typos and such), but still doesn't really paint the whole picture.

With encumbrances, we can clearly say "CuratorX doesn't want these data public because whatever reasons." (And ideally we might share that with eg https://dwc.tdwg.org/terms/#dwc:informationWithheld so that eg a trusted researcher could know that we do have data and who to ask about it, but I don't think we do.) I'm not sure losing that ability for a few specific administrative attributes would be much different, but losing it for any arbitrary thing somehow seems like it probably leads to scary and regrettable places.

(We could require 'encumbered_by....' metadata on each attribute, but I think that's probably WAY over in 'unusable' territory.)

dustymc avatar May 20 '24 21:05 dustymc

Encumbrance committee likes the idea of flagging individual items, but would like to limit access - eg a student can update everything about an (attribute, part attribute, whatever) except the 'hide' checkbox, which requires (manage collection or whatever).

@dustymc to explore feasibility, CPU costs, complexity, temporarily going active development to do so.

dustymc avatar May 31 '24 21:05 dustymc

Flagging individual items would very effectively prevent "load-as-encumbered" which is (more or less) not great for https://github.com/ArctosDB/internal/issues/332.

I am not sure that the complexity of our current encumbrance model supports scalability. https://github.com/ArctosDB/internal/issues/332 (and various other similar issues) might need some level of "custom UI," that might well require/be best addressed by a "deep" API which deals with various public and private information. I suspect that simple ways of withholding information (eg "this is structurally public, this private" vs. "nearly everything may sometimes be private, depending on descriptive data and conditional information several joins away") is going to have outsized impacts on the reality of providing such tools/services. We should reconsider the weight of simplicity, whatever form this might ultimately take.

dustymc avatar Jul 25 '24 15:07 dustymc

Attribute encumbrances have become a cost which is increasingly difficult to pay.

I'm still in favor of adding a 'private' flag to the necessary attribute tables, which I believe would DRASTICALLY simplify everything for everyone (including the scripts which maintain the private cache). I think this (with an existing issue which already has a more-elegant solution) could be manipulated to provide complete coverage in one easy-to-understand way.

  • mask collector, mask preparator - convert agent records to a private version of attribute 'verbatim agent'
  • mask coordinates, mask year collected - https://github.com/ArctosDB/dev/issues/189
  • mask NAGPRA category, mask unformatted measurements, mask value, mask specimen remarks - are or could readily be attributes
  • mask original field number - https://github.com/ArctosDB/dev/issues/206 - this is probably broken, sensitive identifiers could easily be stored as encumbrances
  • mask part attribute location - attribute (but not record, so this would need extended to part attributes or these data would need a new home)
  • mask record ~~restrict usage~~ - no change necessary
  • EDIT: restrict usage should move to permits, encumbrances should be reserved for 'does stuff in the UI' - see https://github.com/ArctosDB/dev/issues/122

See also: https://github.com/ArctosDB/dev/issues/73

dustymc avatar Oct 16 '24 17:10 dustymc

I think the expense of encumbrances were involved in today's outage.

ctspec_part_att_att (https://github.com/ArctosDB/dev/issues/42) has been frustrating to deal with a few times lately, I need to rebuild part attributes code tables, and that suggests a relatively easy experiment: Can I add the public/private flag to https://arctos.database.museum/info/ctDocumentation.cfm?table=ctspecpart_attribute_type (which will be reborn with a new structure and possibly a new name) and set it to private for https://arctos.database.museum/info/ctDocumentation.cfm?table=ctspecpart_attribute_type#location?

dustymc avatar Nov 12 '24 23:11 dustymc

Location is the part attribute that we mask on all of our records as an encumbrance, but I'd be happy to just have that as a private/public flag rather than encumbrance.

AJLinn avatar Nov 13 '24 01:11 AJLinn

  guid_prefix  |   c    
---------------+--------
 ACUNHC:Bird   |      1
 ALMNH:Bird    |      6
 ALMNH:EH      |   1337
 ALMNH:Geo     |     23
 ALMNH:Inv     |   5800
 ALMNH:Mamm    |   3838
 ALMNH:Paleo   |  11562
 ANSP:Host     |   3729
 ANSP:Para     |   8020
 ASNHC:Bird    |   1239
 ASNHC:Herp    |    396
 ASNHC:Mamm    |  23041
 BYU:Bird      |      1
 CHAS:Art      |      5
 CHAS:AV       |   1366
 CHAS:Bird     |  15122
 CHAS:Egg      |   3364
 CHAS:EH       |   1052
 CHAS:Ento     |  24340
 CHAS:Fish     |     10
 CHAS:Herb     |  14428
 CHAS:Herp     |  20619
 CHAS:Inv      |  15475
 CHAS:Mamm     |   8399
 CHAS:Teach    |   3687
 CSULB:Fish    |   2289
 DMNS:Egg      |      1
 JSNM:Egg      |    810
 JSNM:Herb     |   1049
 JSNM:Paleo    |   2519
 MSB:Arth      |      1
 MSB:Bird      |      2
 MSB:DGR       |   2270
 MSB:Mamm      |     60
 NHSM:Mamm     |    212
 NMMNH:Paleo   |    956
 PSM:Paleo     |  10934
 TCDGM:Mineral |    811
 TCDGM:Paleo   |    304
 UAM:Arc       | 370118
 UAM:Art       |   8872
 UAM:EH        |  28163
 UCM:Bird      |  11577
 UCM:FossilEgg |    257
 UCM:Herp      |    408
 UCM:Mamm      |   5885
 UMNH:Herp     |      1
 UNM:Geol      |   3369
 UNM:MET       |    190
 UTEP:Bird     |    149
 UTEP:ES       |      3
 UTEP:Herp     |     92
 UTEP:HerpOS   |      2
 UTEP:Inv      |      2
 UTEP:Mamm     |     37
 UTEPObs:Herp  |     11
 UWBM:PB       |  62857
 UWBM:PR       |   8775
 UWBM:VP       |  93349

@jldunnum @catherpes @AdrienneRaniszewski @campmlc @leet1984 @ccwlobo @javanveldhuizen @sjshirar @mvzhuang @AJLinn @acdoll @KatherineLAnderson @DellaCHall @aklompma @ebraker @droberts49 @kderieg322079 @wellerjes @Nicole-Ridgwell-NMMNHS @rwilhoyt @byuherpetology @brandon-s-thompson @kmkocot @kat-sterner @babogan @ufarrell @WaigePilson

any major objections to making part attribute location always private/internal?

(And why are there container-using collections in this list???)

dustymc avatar Nov 14 '24 00:11 dustymc

Csv please?

campmlc avatar Nov 14 '24 00:11 campmlc

Csv

https://arctos.database.museum/search.cfm

dustymc avatar Nov 14 '24 15:11 dustymc

Fine with me!

kmkocot avatar Nov 14 '24 15:11 kmkocot

I'm fine with this.

Nicole-Ridgwell-NMMNHS avatar Nov 14 '24 15:11 Nicole-Ridgwell-NMMNHS

A csv for all 4 MSB collections would be awfully nice, so I don't have to search each one individually . . .

campmlc avatar Nov 14 '24 15:11 campmlc

Screenshot 2024-11-14 at 07 40 50 Screenshot 2024-11-14 at 07 41 02 Screenshot 2024-11-14 at 07 41 09

https://arctos.database.museum/search.cfm?guid_prefix=MSB%3AArth%2CMSB%3ABird%2CMSB%3ADGR%2CMSB%3AFish%2CMSB%3AHerb%2CMSB%3AHerp%2CMSB%3AHost%2CMSB%3AInv%2CMSB%3AMamm%2CMSBObs%3AHerp%2CMSBObs%3AMamm%2CMSB%3APara&part_attribute=location

ArctosDatakjVzC0Lq6m.csv.zip

(Somehow I doubt you can make much use of what's in my CSV, but there it is...)

dustymc avatar Nov 14 '24 15:11 dustymc

I'm okay with this

rwilhoyt avatar Nov 14 '24 15:11 rwilhoyt

go for it

acdoll avatar Nov 14 '24 16:11 acdoll

Fine by me!

babogan avatar Nov 14 '24 16:11 babogan

Fine with me.

javanveldhuizen avatar Nov 14 '24 16:11 javanveldhuizen

Fine with me.

DellaCHall avatar Nov 14 '24 17:11 DellaCHall

Yes!

KatherineLAnderson avatar Nov 14 '24 18:11 KatherineLAnderson

In reference to https://github.com/ArctosDB/arctos/issues/7792#issuecomment-2475114112 Looking these over, even though the collections are using object tracking, these samples are not in locations that are included in the object tracking system - e.g. in external research labs, in research freezers of unbarcoded samples, zoo material of oversize vials that don't fit into standard freezer boxes. It is fine to make these part attributes private, as long as students with manage specimens can still see the attribute.

campmlc avatar Nov 14 '24 20:11 campmlc

Fine with me as well!

ufarrell avatar Nov 15 '24 12:11 ufarrell

Good for me

(And why are there container-using collections in this list???)

and as I described in issue ArctosDB/arctos#8304, we use both barcodes and part attribute location due to not being 100% barcoded in all our objects. Also we prefer the functionality of seeing the part attribute location in the search results view. But yes, make it private!

AJLinn avatar Nov 15 '24 15:11 AJLinn

Fine with me!

WaigePilson avatar Nov 19 '24 00:11 WaigePilson

dev-->https://github.com/ArctosDB/dev/issues/118

dustymc avatar Nov 26 '24 15:11 dustymc