asfimport
asfimport
Read_CSV ParseOptions allows only a single character delimiter. Single character delimiters are highly susceptible to the candidate value existing within the data to be loaded, negating the ability to serve...
The page https://arrow.apache.org/docs/format/Columnar.html#physical-memory-layout has no description of the Map type, even though it is part of the schema.fbs. This makes it difficult to implement and understand when it should be...
In Apache Spark, [explode](https://spark.apache.org/docs/latest/api/sql/index.html#explode) separates the elements of an array column (or expression) into multiple row. Note that each explode works at the top-level only (not recursively). This would also...
Follow-up for https://github.com/apache/arrow/pull/10717 Certain codecs are optional, we should have global pytest fixtures automatically applying the right pytest markers. **Reporter**: [Krisztian Szucs](https://issues.apache.org/jira/browse/ARROW-13380) / @kszucs **Note**: *This issue was originally created...
Now that @jtibshirani improved merging via #11411, nightly benchmarks report the following two top CPU consumers: ``` 18.45% 548638 org.apache.lucene.util.VectorUtil#dotProduct() at org.apache.lucene.index.VectorSimilarityFunction$2#compare() at org.apache.lucene.util.hnsw.HnswGraph#search() at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode() 13.23% 393609 org.apache.lucene.util.LongHeap#upHeap() at...
Extend org.apache.lucene.monitor.MonitorConfiguration with an ability to configure how cache purge task is scheduled by org.apache.lucene.monitor.Monitor (aka Luwak). In particular allow to use an external ScheduledExecutor. Currently each new instance of ...
Apache OpenNLP 2.0.0 has been released. This [version](https://opennlp.apache.org/news/release-200.html) contains new implementations of TokenNameFinder and DocumentCategorizer that supports models in the ONNX format. (TokenNameFinder is in NLPNERTaggerOp, DocumentCategorizer is not currently...
Standard relevance ranked searches for top-X results uses the HitQueue class to keep track of the highest scoring documents. The HitQueue is a binary heap of ScoreDocs and is pre-filled...
Important: This Lucene Directory wrapper approach is to be considered only if an OS level encryption is not possible. OS level encryption better fits Lucene usage of OS cache, and...
We would like to contribute a codec that enables the encryption of sensitive data in the index that has been developed as part of an engagement with a customer. We...