knn icon indicating copy to clipboard operation
knn copied to clipboard

String.substring always returns null for corpus weighting

Open dfilimon opened this issue 12 years ago • 3 comments

At line 173 in Vectorize20NewsGroups.java [1], the substring call is from startIndex 1 to endIndex 1 which always returns an empty string. So, the CorpusWeighting cw is always going to be null.

Did you run it to see if it works? :)

[1] https://github.com/tdunning/knn/commit/c09d742febf5242899b1c187c802d3bbb5164f0d#L0R173

dfilimon avatar Dec 27 '12 14:12 dfilimon

Ouch. Bit again by the index-or-length issue.

This is not a substring of the word... it is a substring of the code for controlling the word weighting. I have run the code and don't understand how it avoided an NPE here.

On Thu, Dec 27, 2012 at 6:57 AM, Dan Filimon [email protected]:

At line 173 in Vectorize20NewsGroups.java [1], the substring call is from startIndex 1 to endIndex 1 which always returns an empty string. So, the CorpusWeighting cw is always going to be null.

Did you run it to see if it works? :)

[1] c09d742#L0R173https://github.com/tdunning/knn/commit/c09d742febf5242899b1c187c802d3bbb5164f0d#L0R173

— Reply to this email directly or view it on GitHubhttps://github.com/tdunning/knn/issues/9.

tdunning avatar Dec 27 '12 20:12 tdunning

Looks like I never ran this version:

Exception in thread "main" java.lang.NullPointerException at org.apache.mahout.knn.Vectorize20NewsGroups$CorpusWeighting.parse(Vectorize20NewsGroups.java:175) at org.apache.mahout.knn.Vectorize20NewsGroups.main(Vectorize20NewsGroups.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)

On Thu, Dec 27, 2012 at 12:30 PM, Ted Dunning [email protected] wrote:

Ouch. Bit again by the index-or-length issue.

This is not a substring of the word... it is a substring of the code for controlling the word weighting. I have run the code and don't understand how it avoided an NPE here.

On Thu, Dec 27, 2012 at 6:57 AM, Dan Filimon [email protected]:

At line 173 in Vectorize20NewsGroups.java [1], the substring call is from startIndex 1 to endIndex 1 which always returns an empty string. So, the CorpusWeighting cw is always going to be null.

Did you run it to see if it works? :)

[1] c09d742#L0R173https://github.com/tdunning/knn/commit/c09d742febf5242899b1c187c802d3bbb5164f0d#L0R173

— Reply to this email directly or view it on GitHubhttps://github.com/tdunning/knn/issues/9.

tdunning avatar Dec 27 '12 20:12 tdunning

Yeah, no worries, I patched it up and ran it. Could you please look at the thread on the mailing list? :)

dfilimon avatar Dec 27 '12 20:12 dfilimon