kellen

Results 42 issues of kellen

https://beam.apache.org/releases/javadoc/2.47.0/org/apache/beam/sdk/extensions/python/PythonExternalTransform.html

In theory this will reduce shuffle/streaming data processed cost Needs a dictionary of common symbols to be provided

Add some kind of API where users can get a convenient dead-letter queue, e.g. combined w/ the `safeMap` or `safeFlatMap` methods

from the deprecated ApproximateUnique

Current default is `SimpleJsonpMapper` which is practically useless, perhaps the value used in the example would be more appropriate? ```scala () => { // Use jackson for user json serialization...

Would need to be carefully tested to ensure it wouldn't kill DB instances

... is currently missing

Works: ``` * [`foo bar`](http://github.com) * [_foo bar_](http://github.com) ``` Fails: ``` * @scaladoc[`foo bar`](foo.bar) * @scaladoc[_foo bar_](foo.bar) ```

Adds * `saveAsZstdDictionary` to train a Zstd dictionary on some arbitrary `SCollection[T]`. Estimates the average size of elements `T`, collects `n` elements based on a target training set size, then...

When using `saveAsSparkey`, if any shard is > ~2gb then you will get a coder exception and something like ``` Error message from worker: org.apache.beam.sdk.util.UserCodeException: java.lang.OutOfMemoryError: Required array length 2147483639...