putRdf failing on travis
@grechaw can you think of a reason that this function would be failing on travis but not elsewhere?
In https://github.com/laurelnaiad/marklogic-samplestack/blob/travis-debug/appserver/java-spring/buildSrc/src/main/groovy/MarkLogicSlurpTask.groovy#L27
void putRdf(client, uri, rdftriples) {
def params = [:]
params.path = "/v1/graphs"
params.queryString = "graph="+uri
params.contentType = "application/n-triples"
params.body = new String(rdftriples.getBytes("UTF-8"))
client.put(params)
}
Here is the relevant part of the stack trace:
Caused by: org.apache.http.NoHttpResponseException: The target server failed to respond
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:712)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:517)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:1066)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:1044)
at groovyx.net.http.HTTPBuilder.doRequest(HTTPBuilder.java:506)
at groovyx.net.http.RESTClient.put(RESTClient.java:163)
at groovyx.net.http.RESTClient$put.call(Unknown Source)
at MarkLogicSlurpTask.putRdf(MarkLogicSlurpTask.groovy:33)
at MarkLogicSlurpTask$_load_closure2.doCall(MarkLogicSlurpTask.groovy:51)
at MarkLogicSlurpTask.load(MarkLogicSlurpTask.groovy:46)
at org.gradle.internal.reflect.JavaMethod.invoke(JavaMethod.java:63)
Thoughts? Occasionally this seems to work (it has been working in the overnight builds.. but this morning when I rerun them its erroring. To the extent we pursue this, let's do it in laurelnaiad/travis-debug.
The two triples files are pretty big -- could it be taking so long as to time out? I can't think why this would have changed recently. It is entirely possible though to split the RDF into more files if indeed that's the problem.... would require new seed data.
Edited title... questioning whether it's an ubuntu issue was premature...
Right, I'm on ubuntu so that's unlikely to be the issue for travis stuff.
Is it trying to load them the same way it was loading the tgz with no buffering?
The tgz contains all of the seed data, including the two rdf files. I'm just using a regular PUT body for these triples, and it takes a little while, but I've never hit a timeout scenario before.
-rw-rw-r-- 1 cgreer cgreer 4522849 Mar 2 14:40 categories.nt -rw-rw-r-- 1 cgreer cgreer 7135317 Mar 2 14:40 resources.nt
Given these are uncompressed, and the latter at least is bigger than the original download tgz, there could be some issue with how big the files are.
Fortunately these are a line-based format. We can just split them up to make smaller ones. I don't want to do that speculatively, or to satisfy travis if we have a plan to move to jenkins, but it could be done. Heck, it doesn't have to be me to do it either.
I was thinking not so much of the PUT but of it.text.getBytes() as the culprit...
Ah, that's unlikely. That's an in-memory operation; I don't think it even copies the structure. However, it does mean that the whole 7M is in memory -- and that could be another issue for travis, although why that would lead to no response is unlcear to me.
Where does the file get loaded into memory?
Woudn't it be a fairly straight shot to give the PUT a stream rather than a giant string? I would think that function is overloaded to take a text stream writer or whatever, no?
Yes that would, well probably would be straightforward - but we don't know that that's the issue... Is it possible to look at the state of the travis Marklogic server while the build is going on?
no, not unless we log something. But ti's very easy to push branch and see if it fixes it.
I'm testing this alternative:
void putRdf(client, uri, rdftriples) {
def params = [:]
params.path = "/v1/graphs"
params.queryString = "graph="+uri
params.contentType = "application/n-triples"
params.body = rdftriples // just give it the file obj. instead of getBytes-ing it into a new String
client.put(params)
}
It timed out the first time travis tried it,but that is in and of itself different because in this case travis realized it was a timeout on its side, and restarted the test automagically.... talk about magic.... will post back when/if the retry cycle ends successfully or unsuccessfully. (The change didn't break anything locally in my test.)
Travis' automatic retry worked. I'll put this in a PR.
I think that will break on windows.
Can you confirm what action needs to be perform to try this one out on windows . I ll give a try once I get pass through errors I am seeing with npm install on windows lately with latest develop branch and ML 8.0-1.1.
I guess this does point to a memory issue -- getBytes copies the string to a new byte array.
The issue for windows hearkens back to the encoding issues we had for EA-1 last year -- some or all of the triple load may fail on windows. I think that this code (without getBytes) will assume that the unicode in the RDF files is acutally some windows encoding.
There's a charset property on the file object IIRC. Will check when I get back to desk.
There's a charset property on the file object IIRC. Will check when I get back to desk.