Clojush
Clojush copied to clipboard
Add Additional CSV Functionality
Add functionality to CSV printing that makes it possible to optionally print the following info for each individual:
:parent-indices :push-program :plush-genome :push-program-size :plush-genome-size :total-error :test-case-errors
So this is a good example of what I was talking about in #104: I'd see this as a configuration that
- affects both JSON and CSV file generation, when set up
- uses the appropriate Clojure idiom (would that be metadata?) to record
- the desired header strings, if they're different from the symbol,
- the conversion function for that column,
- the print format string for that column
For example, if :size-in-points
is specified, the report-writers can know as a matter of course to call the field "size-in-pts", and also have access to the function which calculates the number of points in a given individual's program. Then the general structure for writing a report becomes something more like this, I guess?
(defn csv-stream-open
"Creates and returns a new CSV stream for a given data store; raises error it if it exists already"
[filename field-list]
;; open the stream
;; write the header strings based on the field-list
;; record the field-list in the returned object)
(defn csv-stream-append
"Adds a line to a CSV stream for a given individual"
[stream dude]
;; get the field list from the stream structure
(spit csv-stream-filename
;; format taken from field list defined for this stream
;; list of field values
:append true))
(defn csv-dump
"Writes a pile of individuals to a given CSV stream"
[stream collection]
;; obvious code here to call cvs-stream-append for each individual in the collection
)
If the generation is treated as a field in these reports rather than a higher-scale subdivision, then you also don't need to stage an entire "generation" at once, but can simple call something like cvs-dump
periodically, for instance whenever a new individual is born.
Does that make sense? It will help make this code more maintainable and flexible in future when you decide you want different fields recorded, whether it's fewer or more.
Aside: One of the main reasons people prefer JSON over CSV is that it doesn't impose this obligatory columnar structure you're wrestling with here. If you use the JSON format (or any key:value pair) for this sort of thing, it's much easier to change your mind, or to store heterogeneous "records" for each case. In a CSV framework, you've got to know before you start working what the columns are, and are forced to fill them all no matter what. Backwards-compatibility with earlier results is harder to cope with algorithmically as a result.