asfimport

Results 328 issues of asfimport

Read_CSV ParseOptions allows only a single character delimiter. Single character delimiters are highly susceptible to the candidate value existing within the data to be loaded, negating the ability to serve...

Type: enhancement
Component: C++

The page https://arrow.apache.org/docs/format/Columnar.html#physical-memory-layout has no description of the Map type, even though it is part of the schema.fbs. This makes it difficult to implement and understand when it should be...

Component: Documentation
Component: Format

In Apache Spark, [explode](https://spark.apache.org/docs/latest/api/sql/index.html#explode) separates the elements of an array column (or expression) into multiple row. Note that each explode works at the top-level only (not recursively). This would also...

Type: enhancement
Component: Python

Follow-up for https://github.com/apache/arrow/pull/10717 Certain codecs are optional, we should have global pytest fixtures automatically applying the right pytest markers. **Reporter**: [Krisztian Szucs](https://issues.apache.org/jira/browse/ARROW-13380) / @kszucs **Note**: *This issue was originally created...

Type: enhancement
Component: Python
good-first-issue

Now that @jtibshirani improved merging via #11411, nightly benchmarks report the following two top CPU consumers: ``` 18.45% 548638 org.apache.lucene.util.VectorUtil#dotProduct() at org.apache.lucene.index.VectorSimilarityFunction$2#compare() at org.apache.lucene.util.hnsw.HnswGraph#search() at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode() 13.23% 393609 org.apache.lucene.util.LongHeap#upHeap() at...

type:enhancement
type:task
legacy-jira-priority:Minor
module:core/hnsw

Extend org.apache.lucene.monitor.MonitorConfiguration with an ability to configure how cache purge task is scheduled by org.apache.lucene.monitor.Monitor (aka Luwak). In particular allow to use an external ScheduledExecutor. Currently each new instance of  ...

type:enhancement
legacy-jira-priority:Major
module:core/other
affects-version:8.8
affects-version:8.9
module:monitor

Apache OpenNLP 2.0.0 has been released. This [version](https://opennlp.apache.org/news/release-200.html) contains new implementations of TokenNameFinder and DocumentCategorizer that supports models in the ONNX format. (TokenNameFinder is in NLPNERTaggerOp, DocumentCategorizer is not currently...

type:task
legacy-jira-priority:Major
module:analysis

Standard relevance ranked searches for top-X results uses the HitQueue class to keep track of the highest scoring documents. The HitQueue is a binary heap of ScoreDocs and is pre-filled...

type:enhancement
legacy-jira-priority:Minor
module:core/search
legacy-jira-label:memory
legacy-jira-label:performance
affects-version:4.10.4
affects-version:5.3

Important: This Lucene Directory wrapper approach is to be considered only if an OS level encryption is not possible. OS level encryption better fits Lucene usage of OS cache, and...

type:enhancement
legacy-jira-priority:Major

We would like to contribute a codec that enables the encryption of sensitive data in the index that has been developed as part of an engagement with a customer. We...

type:enhancement
legacy-jira-priority:Major
legacy-jira-label:contrib
legacy-jira-label:codec