heritrix3 icon indicating copy to clipboard operation
heritrix3 copied to clipboard

cannot resolve these dependencies

Open oldRabbitForz opened this issue 4 years ago • 10 comments

image

oldRabbitForz avatar Oct 20 '21 02:10 oldRabbitForz

org.archive.heritrix heritrix-commons 3.4.0-20210923 org.archive.heritrix heritrix-modules 3.4.0-20210923 org.archive.heritrix heritrix-engine 3.4.0-20210923

oldRabbitForz avatar Oct 20 '21 02:10 oldRabbitForz

@ato

oldRabbitForz avatar Oct 20 '21 02:10 oldRabbitForz

I assume that's a screenshot from some IDE but some more context would be helpful. :-) What are you trying to do? What software are you using?

That looks as if the extra repositories that Heritrix requires are not being searched. com.sleepycat:je is in https://download.oracle.com/maven and the other three are in http://builds.archive.org/maven2/ (which recent versions of Maven refuse to access by default without a workaround)

ato avatar Oct 20 '21 02:10 ato

i'm using idea ide. a program with Heritrix3 based on springboot.

oldRabbitForz avatar Oct 20 '21 02:10 oldRabbitForz

when i download these dependencies, the mistakes occured.so how can i resolve these problems.thanks!

oldRabbitForz avatar Oct 20 '21 02:10 oldRabbitForz

@ato

oldRabbitForz avatar Oct 20 '21 02:10 oldRabbitForz

If you're including it as a dependency in a Maven project maybe try adding these repositories to your pom.xml file, although I would normally expect them to be included automatically. You may also need Andy's ~/.m2/settings.xml workaround.

    <repositories>
        <repository>
            <id>builds.archive.org,maven2</id>
            <url>http://builds.archive.org/maven2</url>
        </repository>
        <repository>
            <id>oracleReleases</id>
            <name>Oracle Released Java Packages</name>
            <url>https://download.oracle.com/maven</url>
        </repository>
    </repositories>

If you're using Gradle, I can't help as I've never used it.

However personally I don't recommend embedding Heritrix inside another Java application as it has a lot of dependencies that may cause conflicts and it also does some surprising things like globally seting the JVM's timezone to UTC. I recommend controlling it via the REST API if possible.

ato avatar Oct 20 '21 02:10 ato

thanks,when the crawl data come back,i want process them for other biz. if i use the rest api, i must build another data table to translate the data. i would be complicated。

oldRabbitForz avatar Oct 20 '21 04:10 oldRabbitForz

it would be complicated。

oldRabbitForz avatar Oct 20 '21 04:10 oldRabbitForz

@ato

oldRabbitForz avatar Oct 20 '21 04:10 oldRabbitForz