asfimport issues

Results 328 issues of


                                            asfimport

[C++] Enable multiple character delimiters in read_csv

Read_CSV ParseOptions allows only a single character delimiter. Single character delimiters are highly susceptible to the candidate value existing within the data to be loaded, negating the ability to serve...

Type: enhancement

Component: C++

[Documentation] No description of Map in-memory layout

The page https://arrow.apache.org/docs/format/Columnar.html#physical-memory-layout has no description of the Map type, even though it is part of the schema.fbs. This makes it difficult to implement and understand when it should be...

Component: Documentation

Component: Format

[Python] Explode array column

In Apache Spark, [explode](https://spark.apache.org/docs/latest/api/sql/index.html#explode) separates the elements of an array column (or expression) into multiple row. Note that each explode works at the top-level only (not recursively). This would also...

Type: enhancement

Component: Python

[Python] Better pytest parametrization for different compression codecs

Follow-up for https://github.com/apache/arrow/pull/10717 Certain codecs are optional, we should have global pytest fixtures automatically applying the right pytest markers. **Reporter**: [Krisztian Szucs](https://issues.apache.org/jira/browse/ARROW-13380) / @kszucs **Note**: *This issue was originally created...

Type: enhancement

Component: Python

good-first-issue

Explore moving HNSW's NeighborQueue to a radix heap [LUCENE-10383]

Now that @jtibshirani improved merging via #11411, nightly benchmarks report the following two top CPU consumers: ``` 18.45% 548638 org.apache.lucene.util.VectorUtil#dotProduct() at org.apache.lucene.index.VectorSimilarityFunction$2#compare() at org.apache.lucene.util.hnsw.HnswGraph#search() at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode() 13.23% 393609 org.apache.lucene.util.LongHeap#upHeap() at...

type:enhancement

type:task

legacy-jira-priority:Minor

module:core/hnsw

Allow to configure purge executor in org.apache.lucene.monitor.Monitor [LUCENE-9869]

Extend org.apache.lucene.monitor.MonitorConfiguration with an ability to configure how cache purge task is scheduled by org.apache.lucene.monitor.Monitor (aka Luwak). In particular allow to use an external ScheduledExecutor. Currently each new instance of ...

type:enhancement

legacy-jira-priority:Major

module:core/other

affects-version:8.8

affects-version:8.9

module:monitor

Upgrade to OpenNLP 2.x and add [LUCENE-10621]

Apache OpenNLP 2.0.0 has been released. This [version](https://opennlp.apache.org/news/release-200.html) contains new implementations of TokenNameFinder and DocumentCategorizer that supports models in the ONNX format. (TokenNameFinder is in NLPNERTaggerOp, DocumentCategorizer is not currently...

type:task

legacy-jira-priority:Major

module:analysis

Speed up requests for many rows [LUCENE-6828]

Standard relevance ranked searches for top-X results uses the HitQueue class to keep track of the highest scoring documents. The HitQueue is a binary heap of ScoreDocs and is pre-filled...

type:enhancement

legacy-jira-priority:Minor

module:core/search

legacy-jira-label:memory

legacy-jira-label:performance

affects-version:4.10.4

affects-version:5.3

Directory based approach for index encryption [LUCENE-9379]

Important: This Lucene Directory wrapper approach is to be considered only if an OS level encryption is not possible. OS level encryption better fits Lucene usage of OS cache, and...

type:enhancement

legacy-jira-priority:Major

Contribution: Codec for index-level encryption [LUCENE-6966]

We would like to contribute a codec that enables the encryption of sensitive data in the index that has been developed as part of an engagement with a customer. We...

type:enhancement

legacy-jira-priority:Major

legacy-jira-label:contrib

legacy-jira-label:codec