marklogic-samplestack putRdf failing on travis

@grechaw can you think of a reason that this function would be failing on travis but not elsewhere?

In https://github.com/laurelnaiad/marklogic-samplestack/blob/travis-debug/appserver/java-spring/buildSrc/src/main/groovy/MarkLogicSlurpTask.groovy#L27

    void putRdf(client, uri, rdftriples) {
        def params = [:]
        params.path = "/v1/graphs"
        params.queryString = "graph="+uri
        params.contentType = "application/n-triples"
        params.body = new String(rdftriples.getBytes("UTF-8"))
        client.put(params)
    }

Here is the relevant part of the stack trace:

Caused by: org.apache.http.NoHttpResponseException: The target server failed to respond
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:712)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:517)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:1066)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:1044)
at groovyx.net.http.HTTPBuilder.doRequest(HTTPBuilder.java:506)
at groovyx.net.http.RESTClient.put(RESTClient.java:163)
at groovyx.net.http.RESTClient$put.call(Unknown Source)
at MarkLogicSlurpTask.putRdf(MarkLogicSlurpTask.groovy:33)
at MarkLogicSlurpTask$_load_closure2.doCall(MarkLogicSlurpTask.groovy:51)
at MarkLogicSlurpTask.load(MarkLogicSlurpTask.groovy:46)
at org.gradle.internal.reflect.JavaMethod.invoke(JavaMethod.java:63)

Thoughts? Occasionally this seems to work (it has been working in the overnight builds.. but this morning when I rerun them its erroring. To the extent we pursue this, let's do it in laurelnaiad/travis-debug.

Mar 05 '15 19:03 laurelnaiad

The two triples files are pretty big -- could it be taking so long as to time out? I can't think why this would have changed recently. It is entirely possible though to split the RDF into more files if indeed that's the problem.... would require new seed data.

Mar 05 '15 19:03 grechaw

Edited title... questioning whether it's an ubuntu issue was premature...

Mar 05 '15 19:03 laurelnaiad

Right, I'm on ubuntu so that's unlikely to be the issue for travis stuff.

Mar 05 '15 19:03 grechaw

Is it trying to load them the same way it was loading the tgz with no buffering?

Mar 05 '15 19:03 laurelnaiad

The tgz contains all of the seed data, including the two rdf files. I'm just using a regular PUT body for these triples, and it takes a little while, but I've never hit a timeout scenario before.

Mar 05 '15 19:03 grechaw

-rw-rw-r-- 1 cgreer cgreer 4522849 Mar 2 14:40 categories.nt -rw-rw-r-- 1 cgreer cgreer 7135317 Mar 2 14:40 resources.nt

Mar 05 '15 19:03 grechaw

Given these are uncompressed, and the latter at least is bigger than the original download tgz, there could be some issue with how big the files are.

Fortunately these are a line-based format. We can just split them up to make smaller ones. I don't want to do that speculatively, or to satisfy travis if we have a plan to move to jenkins, but it could be done. Heck, it doesn't have to be me to do it either.

Mar 05 '15 19:03 grechaw

I was thinking not so much of the PUT but of it.text.getBytes() as the culprit...

Mar 05 '15 19:03 laurelnaiad

Ah, that's unlikely. That's an in-memory operation; I don't think it even copies the structure. However, it does mean that the whole 7M is in memory -- and that could be another issue for travis, although why that would lead to no response is unlcear to me.

Mar 05 '15 19:03 grechaw

Where does the file get loaded into memory?

Mar 05 '15 19:03 laurelnaiad

Woudn't it be a fairly straight shot to give the PUT a stream rather than a giant string? I would think that function is overloaded to take a text stream writer or whatever, no?

Mar 05 '15 19:03 laurelnaiad

Yes that would, well probably would be straightforward - but we don't know that that's the issue... Is it possible to look at the state of the travis Marklogic server while the build is going on?

Mar 05 '15 19:03 grechaw

no, not unless we log something. But ti's very easy to push branch and see if it fixes it.

Mar 05 '15 19:03 laurelnaiad

I'm testing this alternative:

    void putRdf(client, uri, rdftriples) {
        def params = [:]
        params.path = "/v1/graphs"
        params.queryString = "graph="+uri
        params.contentType = "application/n-triples"
        params.body = rdftriples // just give it the file obj. instead of getBytes-ing it into a new String
        client.put(params)
    }

It timed out the first time travis tried it,but that is in and of itself different because in this case travis realized it was a timeout on its side, and restarted the test automagically.... talk about magic.... will post back when/if the retry cycle ends successfully or unsuccessfully. (The change didn't break anything locally in my test.)

Mar 05 '15 20:03 laurelnaiad

Travis' automatic retry worked. I'll put this in a PR.

Mar 05 '15 21:03 laurelnaiad

I think that will break on windows.

Mar 05 '15 22:03 grechaw

Can you confirm what action needs to be perform to try this one out on windows . I ll give a try once I get pass through errors I am seeing with npm install on windows lately with latest develop branch and ML 8.0-1.1.

Mar 05 '15 22:03 gghai

I guess this does point to a memory issue -- getBytes copies the string to a new byte array.

The issue for windows hearkens back to the encoding issues we had for EA-1 last year -- some or all of the triple load may fail on windows. I think that this code (without getBytes) will assume that the unicode in the RDF files is acutally some windows encoding.

Mar 05 '15 22:03 grechaw

There's a charset property on the file object IIRC. Will check when I get back to desk.

Mar 06 '15 00:03 laurelnaiad

There's a charset property on the file object IIRC. Will check when I get back to desk.

Mar 06 '15 01:03 laurelnaiad