tribuo
tribuo copied to clipboard
Whats the best route to save the predictions into a csv file (using Tribuo classes)
Ask the question
What's the best route to save the predictions into a csv file (using Tribuo classes). Say I have a List<Prediction<Regressor>>
One way could be to iterate thru the list of items and write it to the disk via some FileXxxx() class.
Is your question about a specific Tribuo class?
List<Prediction<Regressor>> and Dataset (one of it's concrete subclasses)
There isn't a helper to write out a csv file of predictions. You can save the dataset back out using CSVSaver, but that won't have the predicted values in it.
It's roughly two lines by converting the list into a stream.
First write out the dimension headers from the output info inside the model, and then predictions.stream().map(Prediction::getOutput).map(Regressor::getValues).map(Arrays::toString).map(s -> s.substring(1,s.length()-1)).forEach(writer::println). Admittedly that's a little ugly as it has to strip off the [ and ] that Arrays.toString() puts on, so there is a cleaner way with a slightly more complex lambda that combines those two operations.
Alternatively there is Regressor.getSerializableForm() which produces an output string DIM-0=<value>,...,DIM-N=<value> depending on how exactly you want the output to look. This format is the one that's easily consumed by RegressionFactory.generateOutput.
There isn't a helper to write out a csv file of predictions. You can save the dataset back out using
CSVSaver, but that won't have the predicted values in it.It's roughly two lines by converting the list into a stream.
First write out the dimension headers from the output info inside the model, and then
predictions.stream().map(Prediction::getOutput).map(Regressor::getValues).map(Arrays::toString).map(s -> s.substring(1,s.length()-1)).forEach(writer::println). Admittedly that's a little ugly as it has to strip off the[and]thatArrays.toString()puts on, so there is a cleaner way with a slightly more complex lambda that combines those two operations.
It would be nice to have a method that allows this, cause it's something we all probably want to do as part of a pipeline. I can think of many usecases, I;m already in the middle of one such use case.
Ok. I'm not sure where such a method should live. We have done this in the past when writing out classification outputs for comparison against other systems, but it lives in the main method - https://github.com/oracle/tribuo/blob/main/Classification/Experiments/src/main/java/org/tribuo/classification/experiments/ConfigurableTrainTest.java#L169.
Any suggestions on where it should go? It needs to be specialised to each Output type, so I guess it could be a method on the OutputFactory?
Ok. I'm not sure where such a method should live. We have done this in the past when writing out classification outputs for comparison against other systems, but it lives in the main method - https://github.com/oracle/tribuo/blob/main/Classification/Experiments/src/main/java/org/tribuo/classification/experiments/ConfigurableTrainTest.java#L169.
Any suggestions on where it should go? It needs to be specialised to each
Outputtype, so I guess it could be a method on theOutputFactory?
Let me try to work a workflow from a user perspective, I think some of the low-level (granular) calls could be brought to a higher-level (wrapped with higher-level functions) so we don't have to do a lot of x.y.z() to get to the results - there is a bit of a cognitive overload as well when it comes to getting from one part of the flow to the other.
Also, another question sort of related to this one, say I have this block of code:
var mutableValidationDataset = new MutableDataset(wineSource);
for (var i: mutableValidationDataset.getData()) {
System.out.println(i);
}
I'm not able to get hold of each of the example in the mutableValidationDataset. I tried mutableValidationDataset.getData().get(0) but this does not give me any method I can make use of, I'm referring https://tribuo.org/learn/4.0/javadoc/org/tribuo/impl/ArrayExample.html. It would nice to be able to iterate through the features and target fields.
Also, another question sort of related to this one, say I have this block of code:
var mutableValidationDataset = new MutableDataset(wineSource); for (var i: mutableValidationDataset.getData()) { System.out.println(i); }I'm not able to get hold of each of the example in the
mutableValidationDataset. I triedmutableValidationDataset.getData().get(0)but this does not give me any method I can make use of, I'm referring https://tribuo.org/learn/4.0/javadoc/org/tribuo/impl/ArrayExample.html. It would nice to be able to iterate through the features and target fields.
Assuming that's the complete snippet then it's because you forgot the type parameter on MutableDataset (probably should be MutableDataset<Regressor> but it might also infer it properly from the source so MutableDataset<> could work). Then because you forgot the type the JVM washed off all the generics so the Dataset implements Iterable not Iterable<Example<T>> and the type inference inferred Object as the type for i.
You won't get ArrayExample back, the contract is for Example but there aren't many methods just on ArrayExample.
I used your tips and some workarounds to get my solutions but ideally, it would be good to have them via cleaner methods (flows) i.e. class/instance level methods to get to the stuff we need from the input data as well as the prediction classes.
What else did you need apart from the regression outputs? The features and ground truth outputs should be simple to access.