TransmogrifAI icon indicating copy to clipboard operation
TransmogrifAI copied to clipboard

The effect of random seeds on results ?

Open shenzgang opened this issue 2 years ago • 5 comments

When I use Titan tests I get different and very different estimates each time. Does random seeding have that much of an impact?

shenzgang avatar Aug 10 '21 08:08 shenzgang

Yes, indeed. In order to get a predictable behavior you can set random seed in your tests. Depending on your tests structure where you set the seed might vary. For example - https://github.com/salesforce/TransmogrifAI/blob/master/helloworld/src/main/scala/com/salesforce/hw/titanic/OpTitanic.scala#L50,

tovbinm avatar Sep 09 '21 05:09 tovbinm

Thanks for your reply! There is also the question of how to use the generated model to predict unlabeled test sample data. Are there any examples of using model prediction?

shenzgang avatar Sep 09 '21 05:09 shenzgang

You can save a trained model, then load it later, set a new scoring reader / a new input dataset, and finally compute scores by invoking score().

You can also use transmogrifai-local for on-line serving of your model (e.g over HTTP API)

tovbinm avatar Sep 09 '21 06:09 tovbinm

The data set used for model training is labeled column, while the test data is not labeled column. When calling score(), an exception will be thrown

shenzgang avatar Sep 09 '21 07:09 shenzgang

You will need to create an empty label column

leahmcguire avatar Sep 09 '21 14:09 leahmcguire