arctos
arctos copied to clipboard
Agent request: create threshold for verified status
Previously brought up but I thought filing an issue will make sure I dont forget it! I would like to propose a minimum set of requirements for an agent to automatically have verified status. At least a minimum threshold so we can request Dusty to do so automatically for all our agents, either a first time pass or set up a script.
For example, agents can be automatically be given the status=verified if they have any of the following:
- have at least one identifier (url, github, library of congress or other link)
- have at least one address (correspondence or email)
- have an Arctos operator login
- have at least birth date
thoughts? require more or different? if there seems to be a lot of discussion needed then I will post for the Agent Committee to discuss and recommend-- thx!
have at least one identifier (url, github, library of congress or other link)
I would say this should only happen for those with github, wikidata, ORCid or LoC. A url can be anything and some of them are likely not worthy of the addition of a gold star.
have at least one address (correspondence or email)
No from me for this. We have agents with an address of "California" and nothing else. That definitely doesn't deserve a gold star.
have an Arctos operator login
I would guess there are a few of these aren't really informative, but I would assume someone using Arctos would know who they are? At least for the next few years.
have at least birth date
Maybe? But even these can be iffy and not very precise, they can also be guesses. I'm not sure I would be happy with this one.
@ArctosDB/agents-committee might have things to say?
I'm hesitant to eg set up a bot, I think this will probably be a lot more useful if there's a human involved. (And I'm DOING STUFF with verification - it will become anti-useful if it gets to be unreliable.)
I suspect birth date is pretty shady (and misused/misapplied, see https://github.com/ArctosDB/arctos/issues/7437) but possibly worth a look.
Addresses are definitely shady and not good identifiers, eg
Illinois | 38
[email protected] | 43
805-289-9275 | 43
Savoonga, St. Lawrence Island, Alaska | 50
Chicago, Illinois | 549
are the most popular.
Identifiers SHOULD be pretty good, but who knows....
login better be good!
I can pull a spreadsheet (lemme know what you might want to see in it) for review and repatriation, but even then we'll never know if the verified you is also accompanied by 57 other yous (and maybe we don't care, those mean you're not getting proper attribution but the one is still 'verified'?).
All good points. Maybe at least two of the following? or some other threshold I just think in this first flush we should at least apply some criteria so I dont have to do this for all the hundreds of agents I encounter that should be verified but are not. so maybe this will be a one-time request
let me think about the spreadsheet request-- maybe someone (me) goes through and we can set up a one-time batch request.
Still want to hear more thoughts on the criteria!
Since we do not have login as an option to add to Agent's information any more, that maybe shouldn't be one?
At the same time I would like to verify all of my Curatorial Assistants and students who I see in front of me as real people, and I can confirm that they exist. Would they fall under accepted rather than verified?
And can we verify ourselves, or is that circular?
Since we do not have login as an option to add to Agent's information any more, that maybe shouldn't be one?
We still do - it just happens from the other direction
I would like to verify all of my Curatorial Assistants and students who I see in front of me as real people, and I can confirm that they exist. Would they fall under accepted rather than verified?
Good question. I would say accepted unless they have one of the outside identifiers (Wikidata, ORCiD, Library of Congress). I would save verified for agents that people besides you would also be able to disambiguate from others with the same name.
And can we verify ourselves, or is that circular?
I verified myself, but I also have an ORCiD, so....
accepted unless they have one of the outside identifiers
Yea, agreed, verified should indicate some unambiguous way of identifying them that works for someone not familiar with the person in a few decades.
verified myself, but I also have an ORCiD
Also sounds right, it's not a blessing just a confirmation that the data is of a standard.
Which of course doesn't seem to be how things are being used....
select
agent.agent_id,
agent.preferred_agent_name,
a.attribute_value,
idr.t
from
agent
inner join agent_attribute a on agent.agent_id=a.agent_id and a.attribute_type='status'
left outer join (
select agent_id,string_agg(agent_attribute.attribute_type, ' | ') as t from agent_attribute
inner join ctagent_attribute_type on agent_attribute.attribute_type=ctagent_attribute_type.attribute_type
and ctagent_attribute_type.purpose='identifier'
where deprecation_type is null group by agent_id
) idr on agent.agent_id=idr.agent_id
order by
a.attribute_value,
agent.preferred_agent_name
Some of that is me. I was cataloging and pushing our CA's to verified. I'll go back and change them to accepted. Fixed
I don't know about how anyone else is doing this, but if I come across one of the agents used in my collection, I have an item with their signature on it and I know when or where that thing was made or I have a letter in the file with their signature on it, I'm verifying their agent record, regardless if I have a physical address more detailed than Savoonga, St Lawrence Island, Alaska.
Indigenous creators or consultants in Alaska villages with only ~100 people in 1940 are no less valid than my student assistant sitting across the lab from me and my agent records are going to reflect that assumption.
I interpret the existing definitions for status to mean that "verified" exists to have a human assert confidence, and "accepted" exists to be programmatically assertable. So the idea of a one-time batch update via a human making decisions in a spreadsheet then seems most logical...
FWIW my thoughts on criteria are that status should not be a single attribute, but rather one attribute for the human decision and one for the programmatic: https://github.com/ArctosDB/arctos/issues/7328#issuecomment-1917575540
bringing https://github.com/ArctosDB/arctos/issues/7649#issuecomment-2048416747 here
Agent Committee meeting:
- "verify" what we can - @dustymc produce a report of agents with multiple bits of info, specific types of info, stuff that might be verified
- post it here, beg for help
- SQL in data
- re-run those scripts from time to time
- documentation - eg if adding eg ORCID then PUHLEEZE also verify
going active for making lists
someone please help with documentation
- when/why verify
- how to 'delete'
- ???
Here are unverified agents with identifiers. They should at least be spot checked before we proceed. (Is this actually what I think it is? Do the identifiers actually do what they claim?)
These are not good and should just be removed from the agent
BARE_ID | AGENT_ID | PREFERRED_AGENT_NAME | IDRS |
---|---|---|---|
21334984 | https://arctos.database.museum/agent/21334984 | Alicia Davis | GitHub = https://github.com/ |
21334985 | https://arctos.database.museum/agent/21334985 | Clara Frickmann | GitHub = https://github.com/ |
21335546 | https://arctos.database.museum/agent/21335546 | Maddie McCutcheon | GitHub = https://github.com/ |
These dupes have been fixed
AGENT_ID | PREFERRED_AGENT_NAME | IDRS | New IDRS |
---|---|---|---|
https://arctos.database.museum/agent/21254069 | Kenneth Wilson Stewart | Wikidata = https://www.wikidata.org/wiki/Q111609140 | Wikidata = https://www.wikidata.org/wiki/Q111609140 |
https://arctos.database.museum/agent/21335658 | Stanley Wesley Szczytko | Wikidata = https://www.wikidata.org/wiki/Q111609140 | Wikidata = https://www.wikidata.org/wiki/Q111512761 |
https://arctos.database.museum/agent/21351185 | Friedrich-Schiller-Universität Jena | Wikidata = https://www.wikidata.org/wiki/Q154561 | Wikidata = https://www.wikidata.org/wiki/Q154561 |
https://arctos.database.museum/agent/21351186 | Herbarium Haussknecht (JE) | Wikidata = https://www.wikidata.org/wiki/Q154561 | Wikidata = https://www.wikidata.org/wiki/Q22110590 |
Not sure about the two forms of LoC? But they both seem to work.
AGENT_ID | PREFERRED_AGENT_NAME | IDRS |
---|---|---|
https://arctos.database.museum/agent/21336524 | John Woodhouse Audubon | Library of Congress = https://id.loc.gov/authorities/names/nr98018569 |
https://arctos.database.museum/agent/10011348 | Barbara R. Stein | Library of Congress = https://lccn.loc.gov/n2001003670 |
A quick review of the Github ids look like people we know, so I would say that list could get verified.
found this malformed ORCiD
AGENT_ID | PREFERRED_AGENT_NAME | IDRS |
---|---|---|
https://arctos.database.museum/agent/21335537 | Nico Lübcker | ORCID = https://orcid.org/ 0000-0001-7141-6669 |
unable to fix it though?
ERROR_ID | 294FDC97-F66C-4CBB-8B27ADC5D875FE8D -- | -- ERROR_TYPE | application ERROR_MESSAGE | manage_agent_attribute fail ERROR_DETAIL | Message: ERROR: Identifiers must be unique. Do not create duplicate agents. Please carefully search before continuing. https://orcid.org/ 0000-0001-7141-6669 Where: PL/pgSQL function trigger_fct_tr_agent_attribute_biud() line 99 at RAISESQL: insert into agent_attribute ( agent_id, attribute_type, attribute_value, begin_date, end_date, related_agent_id, determined_date, attribute_determiner_id, attribute_method, attribute_remark, created_by_agent_id, created_timestamp, deprecated_by_agent_id, deprecation_type, deprecated_timestamp ) ( select agent_id, attribute_type, attribute_value, begin_date, end_date, related_agent_id, determined_date, attribute_determiner_id, attribute_method, attribute_remark, created_by_agent_id, created_timestamp, 21300608, 'update', current_timestamp from agent_attribute where attribute_id=52060 )and I swear I changed this yesterday, but it is still wrong and now I cannot change it.
AGENT_ID | PREFERRED_AGENT_NAME | IDRS | New IDRS |
---|---|---|---|
https://arctos.database.museum/agent/21254069 | Kenneth Wilson Stewart | Wikidata = https://www.wikidata.org/wiki/Q111609140 | Wikidata = https://www.wikidata.org/wiki/Q111609140 |
https://arctos.database.museum/agent/21335658 | Stanley Wesley Szczytko | Wikidata = https://www.wikidata.org/wiki/Q111609140 | Wikidata = https://www.wikidata.org/wiki/Q111512761 |
unable to fix it though?
deleted the old and added the new. Sucks that I just can't edit
Above does not work for Stanley Wesley Szczytko I cannot remove the duplicate.
I deleted https://orcid.org/orcid-search/search?searchQuery=christiane%20todt from https://arctos.database.museum/agent/21350997 and tightened up the control.
Sucks
Once the rules are right there will be no need to.
arctosprod@arctos>> delete from agent_attribute where agent_id=21335658 and attribute_value='https://www.wikidata.org/wiki/Q111609140';
DELETE 1
freshier: temp_agent_with_identifier(2).csv.zip
freshier:
I just added the correct wikidata to https://arctos.database.museum/edit_agent.cfm?agent_id=21335658 so probably needs to be even more freshier?
added the correct wikidata
Just verify while you're in there, my pull ignores those - done.
And see https://github.com/ArctosDB/arctos/issues/7550#issuecomment-2048418496 that needs added to docs-er-sumthin.
looking at the freshest! https://docs.google.com/spreadsheets/d/1_SfRLzzgLx9I6SGvmOUJlgnpwZWDbjUIuAtU6Qjx8Bo/edit#gid=671511126
Many are MVZ or Archives since we've been cleaning and adding identifiers (along with Bios)
Can I get the same dump but with bios profiles pls!THX
ps. if you are on an Arctos committee and have identifiers, relations in your agent profile, seems to be verified status eligible...
profiles
temp_agent_with_identifier(3).csv.zip
eligible
FYI there's only one verified agent who doesn't have some extra info, and there's a good reason (sorting) for it. Yay us! (And IDK why people like torturing themselves by complicating NULL, but whatever....)
select
agent.agent_id bare_id,
'https://arctos.database.museum/agent/'||agent.agent_id agent_id,
agent.preferred_agent_name,
vs.attribute_value,
vs.attribute_type,
getPreferredAgentName(vs.attribute_determiner_id) determiner,
getPreferredAgentName(vs.created_by_agent_id) creator
from
agent
inner join agent_attribute vs on agent.agent_id=vs.agent_id and vs.attribute_type in ('status')
left outer join agent_attribute on agent.agent_id=agent_attribute.agent_id and agent_attribute.attribute_type in (
select attribute_type from ctagent_attribute_type where purpose in ('address','identifier','relationship')
union select 'event' attribute_type
)
where
agent_attribute.attribute_id is null;
bare_id | agent_id | preferred_agent_name | attribute_value | attribute_type | determiner | creator
---------+----------------------------------------+----------------------+-----------------+----------------+--------------------------+--------------------------
0 | https://arctos.database.museum/agent/0 | unknown | verified | status | Teresa J. Mayfield-Meyer | Teresa J. Mayfield-Meyer
bare_id | agent_id | preferred_agent_name | attribute_value | attribute_type | determiner | creator
---------+----------------------------------------------+----------------------+-----------------+----------------+--------------------------+--------------------------
1009981 | https://arctos.database.museum/agent/1009981 | Texas A&M University | verified | status | Teresa J. Mayfield-Meyer | Teresa J. Mayfield-Meyer
0 | https://arctos.database.museum/agent/0 | unknown | verified | status | Teresa J. Mayfield-Meyer | Teresa J. Mayfield-Meyer
(2 rows)
https://arctos.database.museum/agent/1009981 is below where I think the threshold for verification ought to be (it has no data at all), so now what? I could ignore it, generate some sort of report (does anyone watch notifications or if that just 'ignore it with complications'?), auto-de-verify, ???????
My bad - you can see WHY I verified it in the method - it has a wikidata, which I have now added.