reposilite icon indicating copy to clipboard operation
reposilite copied to clipboard

Support Maven Indexer to build a cache for repository content

Open Saxintosh opened this issue 2 years ago • 19 comments

Hi :) Indexing Service support seems to be missing.

Without this, IDEs such as IntelliJ fail to index the repository. A typical problem is the inability to suggest the presence of a new version of a library.

For further information: https://maven.apache.org/maven-indexer/

Many thanks ;-)

Saxintosh avatar Sep 08 '21 10:09 Saxintosh

I've made a research about this topic and it looks like it's quite heavy task to do. The indexer not only process repository to build a tree of available artifacts and its versions - it's also processing all archives and its classes to build references for IDEs. For repositories with thousands of artifacts such a task could literally block process for a quite long time (and I'm talking about minutes if not hours - depends on a repository size and how the indexer utilizes CPU).

I'll take a look on that when Reposilite 3.x will be stable :)

dzikoysk avatar Sep 08 '21 13:09 dzikoysk

Totally agree with you here. This feature is one of the main ones that seems to bloat many of the artifact products into hardware chewing disappointments if I'm honest, but it is obviously nice to have. I am familiar with indexing techniques and it may be useful to use one of the very fast, lightweight java or go key value stores out there when this comes into play. There are so many possible good options these days and @dzikoysk may have a preference, but thought I would share the architectural thought / concern here.

Happy to help when the time comes.

codesplode avatar May 06 '22 16:05 codesplode

I feel like this could be implemented as a plugin, I didn't have time to take a look on the indexer yet tho. With a proper API from maven-indexer maybe we could generate dynamically cache for each deployed JAR instead of just blindly iterating over the whole repository.

dzikoysk avatar May 06 '22 17:05 dzikoysk

@dzikoysk I don't know much about maven indexing, but... For the short term, is it possible to just serve the index, but allow it to be build by another process?

kadaan avatar Aug 20 '22 16:08 kadaan

For instance, when I look in the repo folders for reposilite I see mvn indices which are huge, ~2gb. If I run rm -rf /tmp/reposilite_releases_index && sudo java -jar ~/Downloads/indexer-cli-6.2.2.jar -t full -d /var/reposilited/repositories/releases/.index -c -s -i /tmp/reposilite_releases_index -r /var/reposilited/repositories/releases, then I get a nice, small, index that Intellij can import and work with.

kadaan avatar Aug 20 '22 17:08 kadaan

Sorry for late response, I'm currently on vacation :)

As far as I remember, the indexer builds an archive and that's pretty much all, so as soon as you'll find a tool to run indexer, it should work fine. There was sth like Maven Indexer CLI, so it could be an option I guess, then just put generated file in the root directory of each repository. Take a look on this, Strongbox provides quite nice description of how it works:

  • https://strongbox.github.io/developer-guide/maven-indexer.html

dzikoysk avatar Aug 22 '22 23:08 dzikoysk

No worries, hope you are having a great vacation. I did the above and it works great, but I think that there will be two issues.

  1. reposilite will overwrite the index
  2. the index will become stale

kadaan avatar Aug 22 '22 23:08 kadaan

Reposilite won't modify your index, because there's nothing that really checks for index file, but yes, it'll most likely become stale. In theory it could be handled with Reposilite's plugin that could rebuild index on each deploy with DeployEvent:

  • https://github.com/dzikoysk/reposilite/blob/994a5cdef40307f660940c15e7f3edc4c41ec7fa/reposilite-backend/src/main/kotlin/com/reposilite/maven/api/DeploymentApi.kt#L31-L38

dzikoysk avatar Aug 22 '22 23:08 dzikoysk

hello I am the CEO of open source please give the issue thanks

I might try to do some work on this, because it seems like this could benefit reposilite a decent amount (and I finally got around to setting up a reposilite instance for myself, so I now actually have a reason to work on stuff for reposilite lol)

solonovamax avatar Dec 21 '22 21:12 solonovamax

Sure, I think that #1606 might be a good start. As far as I remember it was blocked due to a bug in Maven Indexer:

  • https://issues.apache.org/jira/browse/MINDEXER-171
  • https://github.com/apache/maven-indexer/pull/253

But it's merged now, so I guess you could try to continue @GrzegorzSmardzewskiAllegro's work :)

dzikoysk avatar Dec 21 '22 21:12 dzikoysk

Yeah, I checked it out, but it doesn't look very complete at all

has literally zero functionality lol

I'll probably look into it more after I've eaten smth (it's 4:40 PM and I barely had breakfast)

solonovamax avatar Dec 21 '22 21:12 solonovamax

Also, with the implementation of the indexer plugin, it would make things like implementing a search api extremely simple (it literally has it built in, however that means that it bundles apache lucene, which is pretty bulky)

If I made some decent progress with the indexer stuff, I might also look at implementing a plugin that adds routes for searching. However, I'd run into the same issue as what happened in #1510 if I wanted to add a "search bar"

solonovamax avatar Dec 21 '22 21:12 solonovamax

Well, we're still considering this as standalone plugin, so I'd focus on its main purpose for now to see if it'll even work. Speaking of the search functionality, it should be built-in and we can use statistics module to query through already recorded entries or extend db with standalone index of gav identifiers.

dzikoysk avatar Dec 21 '22 21:12 dzikoysk

Well, we're still considering this as standalone plugin, so I'd focus on its main purpose for now to see if it'll even work. Speaking of the search functionality, it should be built-in and we can use statistics module to query through entries or extend db with standalone index of gav identifiers.

yeah, I was going to do it as a standalone plugin. I just mentioned the issue about javadocs, because it's related to allowing plugins to extend the UI

We wouldn't need to have any of the gav identifiers stored in the db, because the maven indexer api already provides a way to search artifacts using apache lucene

solonovamax avatar Dec 21 '22 22:12 solonovamax

allowing plugins to extend the UI

I don't think it's possible by design, because frontend is built as standalone module. To make it possible we'd need some kind of extensible SSR and it's kinda beyond currently used tech stack.

maven indexer api already provides a way to search artifacts using apache lucene

This solution is pretty heavy and as far as I understand it'd be the easiest option, we can't do it. The thing we could... consider is to make search engine implementation swappable, so by default it could run on top of db and e.g. maven indexer plugin could provide enhanced impl.

dzikoysk avatar Dec 21 '22 22:12 dzikoysk

here's something that sounds like it could work:

a "features" api can be implemented that returns what features are enabled. So, the frontend will query that api and say "yo, is there a search feature enabled here" and the backend can respond with what features are enabled (eg. an endpoint that responds with a json list of enabled features)

By default, reposilite won't have any search feature. But, users can choose to include either the "simple search provider" or the "maven indexer search provider" as a plugin. This keeps reposilite as lightweight to run as possible. If either of those exist, then when the frontend queries a list of features, it will see the search feature is enabled, and can then appropriately enable the UI to provide a search bar. And, all search apis will conform to the same standard set of routes.

However, if a search api wishes to provide additional features, such as maven indexer search providing the ability to search for specific class names (afaik it indexes that stuff), then rather than the feature list containing only ["search"], it might contain something like ["search", "class_search"], and the frontend can decide what to do with that.

solonovamax avatar Dec 22 '22 18:12 solonovamax

It'd require to implement all plugins in UI despite of the fact that those features are in fact defined by 3rd party plugins. This is kinda against the idea of plugin system and such plugins might as well be built into the main sources 🤔

dzikoysk avatar Dec 22 '22 19:12 dzikoysk

Hi all,

I'm new to reposilite, I landed here while searching for, well, the search functionality.

I think even a simple and lightweight repo manager should have this feature and, as said above, its computational challenge could be managed by both having incremental index updates (eg, as new deployments are made) and by having parallel threads or processes dealing with it.

As for the UI extensibility, a plug-in system for a web app should be able to extend the UI, by having a design where the plug-in should be able to declare a new URL pattern that it's going to cover (eg, /search, which could be automatically put under /plug-ins/search) and then the plug-in could have the UI files it needs under its own UI directory (eg, HTML, CSS, .js files under <plug-in-home>/search).

The UI would remain detached from its plug-ins, the mechanism to bring them up could be something like a dynamically-built tab of available plug-ins (which would build links from declared URL endpoints like /search and declared titles/names like "Artifact Search"). The plug-ins should be aware of core features, eg, how to send the user to the details about a found artifact (ie, which URL to build for that, or which core Js function to invoke to get that link).

Whatever, I very much support having a search feature.

marco-brandizi avatar Jun 13 '23 12:06 marco-brandizi