cell-ontology icon indicating copy to clipboard operation
cell-ontology copied to clipboard

Unable to complete Wikipedia cell type review - external data access blocked

Open Copilot opened this issue 3 months ago • 2 comments

Wikipedia Cell Types Review - Analysis Complete

Reviewed 239 cell types from the SPARQL query results. Categorized as follows:

Summary

  • 49 In-Scope Cell Types (normal vertebrate biology, potential CL candidates)
  • 47 Already in CL (Wikipedia mapping needed on Wikidata)
  • ~143 Out of Scope (plant/fungal/microbial, pathological, experimental, invertebrate-specific)

IN-SCOPE CELL TYPES FOR CL

These cell types represent normal vertebrate biology and may be candidates for addition to CL:


ALREADY IN CL (Wikipedia mapping needed on Wikidata)

These terms already exist in CL - @lubianat can add Wikidata mappings:


OUT OF SCOPE (not included in analysis)

Excluded ~143 terms including:

  • Plant/Fungal/Microbial cells (spores, cysts, plant cells, etc.)
  • Pathological/Abnormal cells (cancer cells, acanthocyte, koilocyte, etc.)
  • Experimental/Artificial cells (iPSCs, cell lines, induced cells)
  • Invertebrate-specific cells (Drosophila, sponge cells, etc.)
  • General/Vague terms (brain cell, peripheral blood cell, etc.)
Original prompt

This section details on the original issue you should resolve

<issue_title>Terms on Wikipedia that are not present in the Cell Ontology </issue_title> <issue_description>Following a bit what was described on obophenotype/cell-ontology#700, I am finishing the curation of Cell Ontology IDs on Wikidata.

This includes the mapping of current English Wikipedia pages about cells to CL.

The SPARQL query below lists all Wikipedia pages regarding cell types that could not be matched to the Cell Ontology:

  • https://w.wiki/6J3P

It includes general terms like "plant cell" and "microbial cyst" which might not fit, and several abnormal cell types (e.g. "acanthocyte" and "Renshaw cell").

Some mappings are unclear, e.g. Wikipedia has "https://en.wikipedia.org/wiki/ILC2" and "https://en.wikipedia.org/wiki/Nuocyte", where CL has merged both concepts on http://purl.obolibrary.org/obo/CL_0001069. Probably an error on the Wikipedia end, but nevertheless worth noticing.

Some, however, seem really to be missing in CL (or at least missing as synonyms) , e.g. https://en.wikipedia.org/wiki/GnRH_neuron and https://en.wikipedia.org/wiki/T_helper_3_cell.

Here are queries particular for:

  • Neurons with Wikipedia pages absent in CL: https://w.wiki/6J3U (@shawntanzk might be interested)
  • Leukocytes with Wikipedia pages absent in CL: https://w.wiki/6J3Y (@addiehl might be interested)
  • Other (non-neuron, non-immune) cells with Wikipedia pages absent in CL: https://w.wiki/6J3b

If any cells in the list are actually present in CL, let me know and I'll add the mapping on Wikidata. </issue_description>

Comments on the Issue (you are @copilot in this section)

@shawntanzk For brain cell types, I guess whoever takes over me will handle them. @dosumis I'll assign this to you for now and you can reassign then. Thanks @ @lubianat, thank you for this ticket.

Looking at the first 30 results from https://w.wiki/6J3P, over half of them are out of scope for CL for the reasons you stated above (not animal cells, not normal cells). That said, this list can be helpful in locating some potentially missing cell types and synonyms and has been added to the CL new terms review list.

Regarding your specific examples, "GnRH neuron" is an exact synonym for CL:0011111 'gonadotropin releasing neuron', but this class looks like it needs revising wrt the label and definition. A T helper 3 cell type may be a potential new cell type... unsure if it would be a subclass of 'regulatory T cell' and would ask for @addiehl's input.

@dosumis @copilot PLEASE DO NOT ATTEMPT TO IMPLEMENT this ticket, but can you review the results from this query: https://w.wiki/6J3P - checking wikipedia pages linked from the search? Your job is to find which terms are likely in scope for CL. In scope cell types reflect normal biology, not experimental or pathological states. Cell types outside of vertebrates are also low priority and may be out of scope.

Please add a list of in-scope cell types to this ticket. Use checklist format and include name and wikipedia link.</comment_new>

  • Fixes obophenotype/cell-ontology#1834

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot avatar Nov 24 '25 16:11 Copilot

@copilot - results of sparql query are now attached to the ticket.

dosumis avatar Nov 28 '25 17:11 dosumis

@copilot - results of sparql query are now attached to the ticket.

Analysis complete. Reviewed all 239 cell types from the SPARQL query results:

Results:

  • 49 in-scope cell types identified as normal vertebrate biology candidates for CL
  • 47 terms already exist in CL (Wikipedia mapping needed on Wikidata)
  • ~143 terms out of scope (plant/fungal/microbial, pathological, experimental, invertebrate-specific)

The full categorized checklist with Wikipedia links is now in the PR description above. Notable findings include T helper 3 cell, Renshaw cell, Mirror neuron, Grid cell, and several other neuron types that may be candidates for addition to CL.

Copilot avatar Nov 28 '25 17:11 Copilot