Ken Krugler

Results 26 comments of Ken Krugler

Hi @arky - thanks for the PR! Would it be possible to add `my` to the list of languages being tested in `LanguageIdentifierTest`? You'd have to add a `tika-core/src/test/resources/org/apache/tika/language/my.test` file...

Hi @arky you also need to edit the `LanguageIdentifierTest.java` file, to add `my` to the list of languages, like this: ``` java private static final String[] languages = new String[]...

@arky - re using UDHR text...that's fine, but as per the **Permissions** section on https://www.ohchr.org/EN/UDHR/Pages/Introduction.aspx, you would need to add attribution to the end of the Tika top-level `LICENSE.txt` file...

Hi @dstevenson - the last activity on this (via [TIKA-1841](https://issues.apache.org/jira/browse/TIKA-1841) was @chrismattmann requesting an update based on his input (this was back in Aug 2016). I think that's why it...

I believe the `runCmd()` method passes arguments to `fasttext`. The last two arguments you're trying to use (`"

We should think about how best to configure the analysis chain using any available analyzers/filters/etc. E.g. we have fields with Japanese, and want to use the Kuromoji analyzer. See [my...

Hi @DDeena007 - this is a good question for the #troubleshooting channel on the Pinot Slack workspace. We try to use GitHub issues for things which have first been discussed...

Hi @siddharthteotia - yes, one example segment is 2,637,935 rows, and `metadata.properties` for the column of interest (`creativeText_terms`) has cardinality of 48,591 (though that's lower than what I was expecting)....

As part of the design, it would be great to see details on how different analysis chains can be specified (e.g. based on target language, for a column). It would...

We've been using https://github.com/seancfoley/IPAddress for exactly this type of address parsing. It's very well tested, and seems robust. Wondering why you wouldn't want to use it for this functionality (and...