h2o-3 icon indicating copy to clipboard operation
h2o-3 copied to clipboard

NullPointerException on inclusion of '×' character in filename

Open exalate-issue-sync[bot] opened this issue 1 year ago • 2 comments

Using the version shipped with the R package version 3.6.0.8, though importing data in SVMlight format from the Web UI and not from R, I get this exception upon clicking "Parse" when one of the added file(s) contains the '×' character in its name.

Using the same file named without that character, the import gets to the stage where I can peek at the file contents.

exalate-issue-sync[bot] avatar May 13 '23 18:05 exalate-issue-sync[bot]

Kiyoshi Kamishima commented: I know this is an ancient one, but it still happens with the latest 3.22.0.2. I can reproduce this by using the following filename. {{/tmp/日本語.csv}}

The rest is my analysis on this matter. At the following method, it tries to retrieve the file path encoded in the key name. {noformat} public final class PersistNFS extends Persist { private static File getFileForKey(Key k) { {noformat} It expects the path is fully preserved, but it is not. {noformat} k = {Key@6514} "$05ff00000000000000006e66733a2f2f746d702fe52c9e$.csv" off = 10 s = "/tmp/�,�.csv" {noformat}

Therefore the path becomes invalid, and eventually causes NPE at the following location. {noformat} 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: Caused by: java.lang.RuntimeException: This H2O node couldn't read data from '$6e66733a2f2f746d702fe52c9e$.csv'. Please make sure the file is available on all H2O nodes and/or check the working directories. 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: at water.parser.ParseSetup$GuessSetupTsk.map(ParseSetup.java:399) 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: at water.MRTask.compute2(MRTask.java:601) 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: at water.H2O$H2OCountedCompleter.compute1(H2O.java:1313) 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: at water.parser.ParseSetup$GuessSetupTsk$Icer.compute1(ParseSetup$GuessSetupTsk$Icer.java) 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: at water.H2O$H2OCountedCompleter.compute(H2O.java:1309) 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: at jsr166y.CountedCompleter.exec(CountedCompleter.java:468) 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: Caused by: java.lang.NullPointerException 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: at water.persist.PersistManager.load(PersistManager.java:184) 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: at water.Value.loadPersist(Value.java:241) 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: at water.Value.memOrLoad(Value.java:120) 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: at water.Value.get(Value.java:134) 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: at water.fvec.Vec.chunkForChunkIdx(Vec.java:1089) 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: at water.fvec.ByteVec.chunkForChunkIdx(ByteVec.java:19) 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: at water.fvec.ByteVec.getFirstBytes(ByteVec.java:30) 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: at water.parser.ZipUtil.getFirstUnzippedBytesChecked(ZipUtil.java:46) 12-21 16:30:20.712 172.27.45.1:54321 22266 #659-2346 ERRR: at water.parser.ParseSetup$GuessSetupTsk.map(ParseSetup.java:397) {noformat}

This happens because when a key is initialized from a string, each UTF-16 code unit in it is rounded to a byte. Therefore any filename including non-Latin (>=U+0100) characters would cause the issue. {noformat} final public class Key<T extends Keyed> extends Iced<Key<T>> implements Comparable { private static byte[] decodeKeyName(String what) { for( int i=0; i<res.length; i++ ) res[i] = (byte)what.charAt(i); {noformat}

exalate-issue-sync[bot] avatar May 13 '23 18:05 exalate-issue-sync[bot]

JIRA Issue Migration Info

Jira Issue: PUBDEV-2508 Assignee: New H2O Bugs Reporter: Sander Maijers State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A

DinukaH2O avatar May 15 '23 10:05 DinukaH2O