context7 icon indicating copy to clipboard operation
context7 copied to clipboard

🧹 Help us clean up duplicate (or unnecessary) libraries in Context7

Open enesgules opened this issue 5 months ago • 53 comments

Hey everyone,

We need your help to clean up libraries in Context7.

While searching for a library, if you come across multiple instances of the same one, please report them under this issue. Include a screenshot or a short note about what you searched for and what you found, and we’ll clean up the duplicates.

Besides duplicates, you can also suggest libraries you think should be deleted—just let us know which ones and why.

Thanks in advance!

enesgules avatar Jun 26 '25 08:06 enesgules

A few questions about the quest:

Garbage Collection: Has the project considered automated removal based on usage metrics? Libraries unused/not-queried for 60+ days could be auto-removed since adding them back is trivial. This avoids manual bias and prevents "popularity contest" dynamics where newer libraries get unfairly targeted or someone just doesn't like an approach.

Duplicate Detection: I've tried submitting duplicates before and the system blocked me, so some detection exists. If not already implemented, hashing submissions and checking against existing github url would be straightforward and inexpensive to prevent duplicates upfront.

The "suggest libraries you think should be deleted" approach is a perfect example of the popularity contest issue - manual curation risks favoring established libraries over promising newcomers. Usage-based cleanup seems more objective and scalable.

mk11232 avatar Jun 27 '25 20:06 mk11232

Yep we already check for duplicates, we will think about usage based cleanup thanks for the feedback!

enesgules avatar Jul 04 '25 11:07 enesgules

https://context7.com/context7/docs_astral_sh-uv, https://context7.com/astral-sh/setup-uv and https://context7.com/llmstxt/docs_astral_sh-uv-llms.txt are duplicates of https://context7.com/astral-sh/uv

cobrabr avatar Jul 13 '25 19:07 cobrabr

https://context7.com/llmstxt/staging_bryntum-products-calendar-llms.txt is a duplicate and should be deleted, it's based on staging server

Actual library: https://context7.com/llmstxt/bryntum-products-calendar-llms.txt

kronaemmanuel avatar Jul 14 '25 09:07 kronaemmanuel

Hey @cobrabr are you sure if setup-uv is the same as uv?

enesgules avatar Jul 22 '25 05:07 enesgules

This is somewhat an adjacent issue: I use TailwindCSS extensively, and the references often jump between v3 and v4 links. Even in v4 itself, the LLM pulls the wrong topic. Today's example was perfect. We wanted to test the TailwindCSS typography plugin, but the topic "typography plugin @plugin directive v4" pulling from "context7/tailwindcss" instead landed upon "Load Legacy Tailwind Plugin with @plugin directive". There are clearly more relevant items in the entire library: https://context7.com/context7/tailwindcss But perhaps a library with 500k tokens bears a higher risk of misdirection? I can't even call this hallucination. It's clearly searched and landed on the wrong topic.

Tenelia avatar Jul 22 '25 09:07 Tenelia

hey Tenelia. The snippet about Legacy Tailwind Plugin exist in all docs resources including docs repo. I dont know how to filter it out. I hope AI agent would be smart enough to not to use a code flagged legacy.

https://github.com/tailwindlabs/tailwindcss.com/blob/main/src/docs/functions-and-directives.mdx#_snippet_11

enesakar avatar Jul 22 '25 10:07 enesakar

It doesn't seem feasible because it still gets picked up in the wrong way. A misdirection causes the AI to thread the needle wrong, leading to a cascading series of missing imports, syntax issues, etc. Are you able to grep and remove sections? As things scale, you may need to have batch jobs setup for each lib.

Tenelia avatar Jul 22 '25 10:07 Tenelia

Hey @cobrabr are you sure if setup-uv is the same as uv?

No, it's not, you're right. My bad, good catch.

cobrabr avatar Jul 23 '25 04:07 cobrabr

You really need to add a voting system on the site, that way we downvote and flag stuff quickly, get enough votes and flags and it triggers a review.

designgears avatar Aug 04 '25 23:08 designgears

In relation to this if the repo for a docs site was made public would that be a preferable method for adding docs to this tool?

and if so how would they be able to go about swapping context7 over from the website to the markdown repo?

jakewatson-bristol avatar Aug 08 '25 11:08 jakewatson-bristol

You can add both versions and make LLM pick the one you like, but repos are the recommended method

enesgules avatar Aug 08 '25 13:08 enesgules

Remove /bevyengine/bevy in favor of docs.rs/bevy/latest

FizzWizZleDazzle avatar Aug 15 '25 15:08 FizzWizZleDazzle

Why, does it have anything missing?

enesgules avatar Aug 15 '25 16:08 enesgules

https://context7.com/mui/material-ui-docs is a duplicate of https://context7.com/mui/material-ui.

I think we can argue that https://context7.com/mui/material-ui-docs is the one to remove. It looks like https://context7.com/mui/material-ui correctly identifies releases (it doesn't blindly take HEAD as the most recent version).

oliviertassinari avatar Aug 17 '25 22:08 oliviertassinari

@enesgules

Why, does it have anything missing?

If you were asking me, yes it does, because a lot of bevy documentation is generated using cargo doc which uses comments from code which context7 doesn't read.

FizzWizZleDazzle avatar Aug 17 '25 23:08 FizzWizZleDazzle

context7.com/mui/material-ui-docs is a duplicate of context7.com/mui/material-ui.

I think we can argue that context7.com/mui/material-ui-docs is the one to remove. It looks like context7.com/mui/material-ui correctly identifies releases (it doesn't blindly take HEAD as the most recent version).

I have redirected the library to the other one

enesgules avatar Aug 21 '25 06:08 enesgules

@enesgules

Why, does it have anything missing?

If you were asking me, yes it does, because a lot of bevy documentation is generated using cargo doc which uses comments from code which context7 doesn't read.

I have redirected the libraries

enesgules avatar Aug 21 '25 10:08 enesgules

Hi there,

first of all, thank you for your stunning work with context7.

I regularly use llamaindex (TypeScript in this case), which has two entries which only differ in the .git ending of the GitHub link, both resolve to the same repo: https://context7.com/run-llama/llamaindexts.git https://context7.com/run-llama/llamaindexts

What do you think, could a check on the https://context7.com/add-library?tab=github page for a trailing .git string help reducing duplicates? Assuming links should not be submitted with a trailing .git, the page could strip the string, check for an already existing lib and guide the user to said entry.

Cordobo avatar Sep 02 '25 16:09 Cordobo

yes we already do this but these entries might be added earlier than our check. will remove the one with the .git extension thank you!

enesgules avatar Sep 04 '25 11:09 enesgules

Multiple instances of microsoft_learn

Image

hansmbakker avatar Sep 19 '25 14:09 hansmbakker

Multiple instances of .NET

Image

hansmbakker avatar Sep 19 '25 14:09 hansmbakker

https://context7.com/mui/base-ui is called "MUI Base UI"

Image

But the correct name is "Base UI". Can this be updated? We might add a context7.json file https://github.com/mui/mui-public/issues/555 but for now, it seems that it's better solved in the database directly.

This is also a duplicate of https://context7.com/llmstxt/base-ui_llms_txt. Considering that it seems Context7 recommends GitHub over llms.txt source, we can keep the GitHub source one.

oliviertassinari avatar Sep 19 '25 23:09 oliviertassinari

I have changed the name of the library. For the llmstxt version, it seems to have more content than the github library but would you still prefer it to be removed? @oliviertassinari

enesgules avatar Sep 22 '25 06:09 enesgules

Hey there! It seems like duplicates: https://context7.com/websites/airflow_apache and https://context7.com/websites/airflow_apache-apache-airflow-stable

Siddha911 avatar Sep 24 '25 08:09 Siddha911

removed the smaller one @Siddha911

enesgules avatar Sep 24 '25 16:09 enesgules

I have recently been using Arrow-kt a lot, and I noticed that context7 had only this github configured for the library - https://context7.com/arrow-kt/arrow

Unfortunately that Github repo does not have a lot of documentation and most queries for practical uses cases fail.

I added two improved sources of documentation

  1. https://context7.com/websites/arrow-kt_io_learn (the main docs website for arrow-kt)
  2. https://context7.com/websites/apidocs_arrow-kt_io (the API docs for arrow-kt)

However, I've noticed that agents like Codex and Claude always go for the first github library, which always results in useless context being given to it.

I believe the github lib should be removed, and either the docs site or the api docs should be kept. I don't know what the benefit of keeping one or both is. It would be nice of their context could be merged into one somehow 🤷 . The API docs are linked from the main site, but because the domain is different I think they don't get picked up indexing.

AdiRishi avatar Sep 30 '25 06:09 AdiRishi

we can redirect the github one into the main docs website so that when someone clicks, they will be automatically redirected to the website version and it will not be visible in mcp. so your agents can't choose it. As an alternative to merging the libraries you can tell your agent to make two queries with smaller token limits, but yeah we index based on domain and subdomains are not added to the main domain

enesgules avatar Sep 30 '25 08:09 enesgules

I think the redirect is a good idea, basically it seems like agents will go for the simpler and more straightforward library id - arrow-kt/arrow as opposed to the other ones, which makes sense. I think that should fix the "default" experience, and then from there ofc if we need to query the API docs specific instructions can be given.

AdiRishi avatar Sep 30 '25 11:09 AdiRishi

Redirected, thanks for reporting

enesgules avatar Sep 30 '25 12:09 enesgules