Results 53 comments of Kevin Kuo

@edgararuiz is this in scope for dbplyr?

Just learned of this on the databricks blog. If the Scala API is at parity with PySpark, it should be something we can support without too much trouble.

Currently the user doesn't have control over subsampling for VGM training, since it's hardcoded in https://github.com/DAI-Lab/CTGAN/blob/fd507166f132381bc62b60f6457028f5bcaa904c/ctgan/synthesizer.py#L114-L115 To clarify, the non-scalable piece is the `BayesianGaussianMixture`, so I'm proposing to subsample the...

Got it, sounds like there a couple ways to proceed, dictated by what you think a "model" represents, i.e., if it should be identified with the metadata of a dataset....

What's the status on supporting overwriting for local? https://github.com/tensorflow/ecosystem/blob/b344507821fb1c1731412d814a73d60b1f5c3aa3/spark/spark-tensorflow-connector/src/main/scala/org/tensorflow/spark/datasources/tfrecords/DefaultSource.scala#L128-L131

Have we thought about what the CLI interface would look like? E.g. maybe `tgan fit` takes a path to a real data csv writes a pkl and `tgan sample` takes...

Being able to "map to NA on the F# side of things" would be great, and I agree that using options makes sense. I'm relatively new to F# (coming from...

am thinking we start with requiring a file first

could also look for them during manifest processing

example in the wild of being able to do somewhat complex rearrangements declaratively: https://github.com/lucidrains/mlp-mixer-pytorch/blob/324121d19b9425b2eefe586d6e02de9f263da1e8/mlp_mixer_pytorch/permutator.py#L29