arctos icon indicating copy to clipboard operation
arctos copied to clipboard

Agent request: create threshold for verified status

Open mkoo opened this issue 11 months ago • 32 comments

Previously brought up but I thought filing an issue will make sure I dont forget it! I would like to propose a minimum set of requirements for an agent to automatically have verified status. At least a minimum threshold so we can request Dusty to do so automatically for all our agents, either a first time pass or set up a script.

For example, agents can be automatically be given the status=verified if they have any of the following:

  • have at least one identifier (url, github, library of congress or other link)
  • have at least one address (correspondence or email)
  • have an Arctos operator login
  • have at least birth date

thoughts? require more or different? if there seems to be a lot of discussion needed then I will post for the Agent Committee to discuss and recommend-- thx!

mkoo avatar Mar 15 '24 20:03 mkoo

have at least one identifier (url, github, library of congress or other link)

I would say this should only happen for those with github, wikidata, ORCid or LoC. A url can be anything and some of them are likely not worthy of the addition of a gold star.

have at least one address (correspondence or email)

No from me for this. We have agents with an address of "California" and nothing else. That definitely doesn't deserve a gold star.

have an Arctos operator login

I would guess there are a few of these aren't really informative, but I would assume someone using Arctos would know who they are? At least for the next few years.

have at least birth date

Maybe? But even these can be iffy and not very precise, they can also be guesses. I'm not sure I would be happy with this one.

@ArctosDB/agents-committee might have things to say?

Jegelewicz avatar Mar 15 '24 20:03 Jegelewicz

I'm hesitant to eg set up a bot, I think this will probably be a lot more useful if there's a human involved. (And I'm DOING STUFF with verification - it will become anti-useful if it gets to be unreliable.)

I suspect birth date is pretty shady (and misused/misapplied, see https://github.com/ArctosDB/arctos/issues/7437) but possibly worth a look.

Addresses are definitely shady and not good identifiers, eg

 Illinois                                      |    38
 [email protected]               |    43
 805-289-9275                                  |    43
 Savoonga, St. Lawrence Island, Alaska         |    50
 Chicago, Illinois                             |   549

are the most popular.

Identifiers SHOULD be pretty good, but who knows....

login better be good!

I can pull a spreadsheet (lemme know what you might want to see in it) for review and repatriation, but even then we'll never know if the verified you is also accompanied by 57 other yous (and maybe we don't care, those mean you're not getting proper attribution but the one is still 'verified'?).

dustymc avatar Mar 15 '24 20:03 dustymc

All good points. Maybe at least two of the following? or some other threshold I just think in this first flush we should at least apply some criteria so I dont have to do this for all the hundreds of agents I encounter that should be verified but are not. so maybe this will be a one-time request

mkoo avatar Mar 15 '24 21:03 mkoo

let me think about the spreadsheet request-- maybe someone (me) goes through and we can set up a one-time batch request.

Still want to hear more thoughts on the criteria!

mkoo avatar Mar 15 '24 21:03 mkoo

Since we do not have login as an option to add to Agent's information any more, that maybe shouldn't be one?

At the same time I would like to verify all of my Curatorial Assistants and students who I see in front of me as real people, and I can confirm that they exist. Would they fall under accepted rather than verified?

And can we verify ourselves, or is that circular?

ewommack avatar Mar 18 '24 18:03 ewommack

Since we do not have login as an option to add to Agent's information any more, that maybe shouldn't be one?

We still do - it just happens from the other direction

I would like to verify all of my Curatorial Assistants and students who I see in front of me as real people, and I can confirm that they exist. Would they fall under accepted rather than verified?

Good question. I would say accepted unless they have one of the outside identifiers (Wikidata, ORCiD, Library of Congress). I would save verified for agents that people besides you would also be able to disambiguate from others with the same name.

And can we verify ourselves, or is that circular?

I verified myself, but I also have an ORCiD, so....

Jegelewicz avatar Mar 18 '24 18:03 Jegelewicz

accepted unless they have one of the outside identifiers

Yea, agreed, verified should indicate some unambiguous way of identifying them that works for someone not familiar with the person in a few decades.

verified myself, but I also have an ORCiD

Also sounds right, it's not a blessing just a confirmation that the data is of a standard.

Which of course doesn't seem to be how things are being used....

select 
	agent.agent_id,
	agent.preferred_agent_name,
	a.attribute_value,
	idr.t	
from
	agent
	inner join agent_attribute a on agent.agent_id=a.agent_id and a.attribute_type='status'
	left outer join (
		select agent_id,string_agg(agent_attribute.attribute_type, ' | ') as t from agent_attribute 
		inner join ctagent_attribute_type on agent_attribute.attribute_type=ctagent_attribute_type.attribute_type
		and ctagent_attribute_type.purpose='identifier' 
		where deprecation_type is null group by agent_id
	) idr on agent.agent_id=idr.agent_id
order by 
	a.attribute_value,
	agent.preferred_agent_name

dustymc avatar Mar 19 '24 15:03 dustymc

Some of that is me. I was cataloging and pushing our CA's to verified. I'll go back and change them to accepted. Fixed

ewommack avatar Mar 19 '24 17:03 ewommack

I don't know about how anyone else is doing this, but if I come across one of the agents used in my collection, I have an item with their signature on it and I know when or where that thing was made or I have a letter in the file with their signature on it, I'm verifying their agent record, regardless if I have a physical address more detailed than Savoonga, St Lawrence Island, Alaska.

Indigenous creators or consultants in Alaska villages with only ~100 people in 1940 are no less valid than my student assistant sitting across the lab from me and my agent records are going to reflect that assumption.

AJLinn avatar Mar 20 '24 14:03 AJLinn

I interpret the existing definitions for status to mean that "verified" exists to have a human assert confidence, and "accepted" exists to be programmatically assertable. So the idea of a one-time batch update via a human making decisions in a spreadsheet then seems most logical...

FWIW my thoughts on criteria are that status should not be a single attribute, but rather one attribute for the human decision and one for the programmatic: https://github.com/ArctosDB/arctos/issues/7328#issuecomment-1917575540

ekrimmel avatar Mar 20 '24 20:03 ekrimmel

bringing https://github.com/ArctosDB/arctos/issues/7649#issuecomment-2048416747 here


Agent Committee meeting:

  • "verify" what we can - @dustymc produce a report of agents with multiple bits of info, specific types of info, stuff that might be verified
    • post it here, beg for help
    • SQL in data
  • re-run those scripts from time to time
  • documentation - eg if adding eg ORCID then PUHLEEZE also verify

going active for making lists

someone please help with documentation

  • when/why verify
  • how to 'delete'
  • ???

dustymc avatar Apr 10 '24 20:04 dustymc

Here are unverified agents with identifiers. They should at least be spot checked before we proceed. (Is this actually what I think it is? Do the identifiers actually do what they claim?)

temp_agent_with_identifier.csv.zip

dustymc avatar Apr 11 '24 15:04 dustymc

These are not good and should just be removed from the agent

BARE_ID AGENT_ID PREFERRED_AGENT_NAME IDRS
21334984 https://arctos.database.museum/agent/21334984 Alicia Davis GitHub = https://github.com/
21334985 https://arctos.database.museum/agent/21334985 Clara Frickmann GitHub = https://github.com/
21335546 https://arctos.database.museum/agent/21335546 Maddie McCutcheon GitHub = https://github.com/

Jegelewicz avatar Apr 11 '24 17:04 Jegelewicz

These dupes have been fixed

AGENT_ID PREFERRED_AGENT_NAME IDRS New IDRS
https://arctos.database.museum/agent/21254069 Kenneth Wilson Stewart Wikidata = https://www.wikidata.org/wiki/Q111609140 Wikidata = https://www.wikidata.org/wiki/Q111609140
https://arctos.database.museum/agent/21335658 Stanley Wesley Szczytko Wikidata = https://www.wikidata.org/wiki/Q111609140 Wikidata = https://www.wikidata.org/wiki/Q111512761
https://arctos.database.museum/agent/21351185 Friedrich-Schiller-Universität Jena Wikidata = https://www.wikidata.org/wiki/Q154561 Wikidata = https://www.wikidata.org/wiki/Q154561
https://arctos.database.museum/agent/21351186 Herbarium Haussknecht (JE) Wikidata = https://www.wikidata.org/wiki/Q154561 Wikidata = https://www.wikidata.org/wiki/Q22110590

Jegelewicz avatar Apr 11 '24 17:04 Jegelewicz

Not sure about the two forms of LoC? But they both seem to work.

AGENT_ID PREFERRED_AGENT_NAME IDRS
https://arctos.database.museum/agent/21336524 John Woodhouse Audubon Library of Congress = https://id.loc.gov/authorities/names/nr98018569
https://arctos.database.museum/agent/10011348 Barbara R. Stein Library of Congress = https://lccn.loc.gov/n2001003670

Jegelewicz avatar Apr 11 '24 17:04 Jegelewicz

A quick review of the Github ids look like people we know, so I would say that list could get verified.

Jegelewicz avatar Apr 11 '24 17:04 Jegelewicz

Here's fresh data:

temp_agent_with_identifier(1).csv.zip

dustymc avatar Apr 12 '24 13:04 dustymc

found this malformed ORCiD

AGENT_ID PREFERRED_AGENT_NAME IDRS
https://arctos.database.museum/agent/21335537 Nico Lübcker ORCID = https://orcid.org/ 0000-0001-7141-6669

unable to fix it though?

ERROR_ID | 294FDC97-F66C-4CBB-8B27ADC5D875FE8D -- | -- ERROR_TYPE | application ERROR_MESSAGE | manage_agent_attribute fail ERROR_DETAIL | Message: ERROR: Identifiers must be unique. Do not create duplicate agents. Please carefully search before continuing. https://orcid.org/ 0000-0001-7141-6669 Where: PL/pgSQL function trigger_fct_tr_agent_attribute_biud() line 99 at RAISESQL: insert into agent_attribute ( agent_id, attribute_type, attribute_value, begin_date, end_date, related_agent_id, determined_date, attribute_determiner_id, attribute_method, attribute_remark, created_by_agent_id, created_timestamp, deprecated_by_agent_id, deprecation_type, deprecated_timestamp ) ( select agent_id, attribute_type, attribute_value, begin_date, end_date, related_agent_id, determined_date, attribute_determiner_id, attribute_method, attribute_remark, created_by_agent_id, created_timestamp, 21300608, 'update', current_timestamp from agent_attribute where attribute_id=52060 )

and I swear I changed this yesterday, but it is still wrong and now I cannot change it.

AGENT_ID PREFERRED_AGENT_NAME IDRS New IDRS
https://arctos.database.museum/agent/21254069 Kenneth Wilson Stewart Wikidata = https://www.wikidata.org/wiki/Q111609140 Wikidata = https://www.wikidata.org/wiki/Q111609140
https://arctos.database.museum/agent/21335658 Stanley Wesley Szczytko Wikidata = https://www.wikidata.org/wiki/Q111609140 Wikidata = https://www.wikidata.org/wiki/Q111512761

Jegelewicz avatar Apr 12 '24 14:04 Jegelewicz

unable to fix it though?

deleted the old and added the new. Sucks that I just can't edit

Jegelewicz avatar Apr 12 '24 14:04 Jegelewicz

Above does not work for Stanley Wesley Szczytko I cannot remove the duplicate.

Jegelewicz avatar Apr 12 '24 14:04 Jegelewicz

I deleted https://orcid.org/orcid-search/search?searchQuery=christiane%20todt from https://arctos.database.museum/agent/21350997 and tightened up the control.

Sucks

Once the rules are right there will be no need to.

dustymc avatar Apr 12 '24 14:04 dustymc

arctosprod@arctos>> delete from agent_attribute where agent_id=21335658 and  attribute_value='https://www.wikidata.org/wiki/Q111609140';
DELETE 1

dustymc avatar Apr 12 '24 14:04 dustymc

freshier:

I just added the correct wikidata to https://arctos.database.museum/edit_agent.cfm?agent_id=21335658 so probably needs to be even more freshier?

Jegelewicz avatar Apr 12 '24 15:04 Jegelewicz

added the correct wikidata

Just verify while you're in there, my pull ignores those - done.

And see https://github.com/ArctosDB/arctos/issues/7550#issuecomment-2048418496 that needs added to docs-er-sumthin.

dustymc avatar Apr 12 '24 15:04 dustymc

looking at the freshest! https://docs.google.com/spreadsheets/d/1_SfRLzzgLx9I6SGvmOUJlgnpwZWDbjUIuAtU6Qjx8Bo/edit#gid=671511126

Many are MVZ or Archives since we've been cleaning and adding identifiers (along with Bios)

Can I get the same dump but with bios profiles pls!THX

mkoo avatar Apr 22 '24 18:04 mkoo

ps. if you are on an Arctos committee and have identifiers, relations in your agent profile, seems to be verified status eligible...

mkoo avatar Apr 22 '24 18:04 mkoo

profiles

temp_agent_with_identifier(3).csv.zip

eligible

FYI there's only one verified agent who doesn't have some extra info, and there's a good reason (sorting) for it. Yay us! (And IDK why people like torturing themselves by complicating NULL, but whatever....)

select 
    agent.agent_id bare_id,
    'https://arctos.database.museum/agent/'||agent.agent_id agent_id,
    agent.preferred_agent_name,
    vs.attribute_value,
    vs.attribute_type,
    getPreferredAgentName(vs.attribute_determiner_id) determiner,
    getPreferredAgentName(vs.created_by_agent_id) creator
from
    agent
    inner join agent_attribute vs on agent.agent_id=vs.agent_id and vs.attribute_type in ('status')
    left outer join agent_attribute on agent.agent_id=agent_attribute.agent_id and agent_attribute.attribute_type in (
        select attribute_type from ctagent_attribute_type where purpose in ('address','identifier','relationship')
        union select 'event' attribute_type
    ) 
where 
    agent_attribute.attribute_id is null;

 bare_id |                agent_id                | preferred_agent_name | attribute_value | attribute_type |        determiner        |         creator          
---------+----------------------------------------+----------------------+-----------------+----------------+--------------------------+--------------------------
       0 | https://arctos.database.museum/agent/0 | unknown              | verified        | status         | Teresa J. Mayfield-Meyer | Teresa J. Mayfield-Meyer

dustymc avatar Apr 22 '24 19:04 dustymc


 bare_id |                   agent_id                   | preferred_agent_name | attribute_value | attribute_type |        determiner        |         creator          
---------+----------------------------------------------+----------------------+-----------------+----------------+--------------------------+--------------------------
 1009981 | https://arctos.database.museum/agent/1009981 | Texas A&M University | verified        | status         | Teresa J. Mayfield-Meyer | Teresa J. Mayfield-Meyer
       0 | https://arctos.database.museum/agent/0       | unknown              | verified        | status         | Teresa J. Mayfield-Meyer | Teresa J. Mayfield-Meyer
(2 rows)

https://arctos.database.museum/agent/1009981 is below where I think the threshold for verification ought to be (it has no data at all), so now what? I could ignore it, generate some sort of report (does anyone watch notifications or if that just 'ignore it with complications'?), auto-de-verify, ???????

dustymc avatar May 03 '24 14:05 dustymc

My bad - you can see WHY I verified it in the method - it has a wikidata, which I have now added.

Jegelewicz avatar May 03 '24 16:05 Jegelewicz