Kaushik Acharya

Results 36 comments of Kaushik Acharya

In org.apache.spark.rdd.RDD.scala sample function, there's a check for fraction >= 0 ``` require(fraction >= 0, s"Fraction must be nonnegative, but got ${fraction}") ``` But corresponding check for fraction Exception in...

Hi, If you at the example: https://github.com/saurfang/spark-knn/blob/master/spark-knn-examples/src/main/scala/com/github/saurfang/spark/ml/knn/examples/MNIST.scala For KNNClassifier object it sets the two column names i.e. features, prediction .setFeaturesCol("pcaFeatures") .setPredictionCol("predicted") These seems to be missing in your case. On...

Which spark version are you using? These might be helpful for resolving the ml vs mllib error: https://stackoverflow.com/questions/38901123/how-convert-ml-vectorudt-features-from-mllib-to-ml-type https://spark.apache.org/docs/2.1.0/ml-migration-guides.html "While most pipeline components support backward compatibility for loading, some existing...

Have a look at https://github.com/saurfang/spark-knn/blob/master/project/Dependencies.scala val sparktest = "org.apache.spark" %% "spark-core" % "2.1.0" % "test" classifier "tests" Also in [build.sbt](https://github.com/saurfang/spark-knn/blob/master/build.sbt) you can see commonSettings which is defined in [Common.scala](https://github.com/saurfang/spark-knn/blob/master/project/Common.scala) This...

You are facing the same issue as: https://github.com/saurfang/spark-knn/issues/21 Your error says that: Sampling fraction (1.01) must be on interval [0, 1] sampling fraction needs to be Ok, i used MLUtils...

[wiki page](https://github.com/pdfminer/pdfminer.six/wiki) of the repository could be another alternate.

@willjrogers For the above two sentences in input file, following is the output: ``` ConceptLiteMMI(index='1', mm='MMI', score='0.52', preferred_name='Attack behavior', cui='C1261512', semtypes='[socb]', trigger='"attack"-text-0-"Attack"-NNP-0', pos_info='7/6', tree_codes='') ConceptLiteMMI(index='1', mm='MMI', score='0.52', preferred_name='Observation of attack',...

One way of doing this is assigning bigram as your key and bool True as its value: features['fo'] = True features['oo'] = True features[''od'] = True If you want to...

@budiryan As per my understanding, `minCountLabel` is the min word count threshold. Have a look at its usage: https://github.com/epfml/sent2vec/blob/master/src/fasttext.cc#L473 ``` if (uniform(model.rng) > dict_->getPDiscard(line[i]) || dict_->getTokenCount(line[i]) < args_->minCountLabel) continue; ```...

@budiryan Seems you have raised a valid doubt. sent2vec source code is developed over the existing C++ code of [fastText](https://github.com/facebookresearch/fastText/). Hence it has several sections which are not used in...