Clojush Add Additional CSV Functionality

Add functionality to CSV printing that makes it possible to optionally print the following info for each individual:

:parent-indices :push-program :plush-genome :push-program-size :plush-genome-size :total-error :test-case-errors

Feb 21 '15 15:02 thelmuth

So this is a good example of what I was talking about in #104: I'd see this as a configuration that

affects both JSON and CSV file generation, when set up
uses the appropriate Clojure idiom (would that be metadata?) to record
- the desired header strings, if they're different from the symbol,
- the conversion function for that column,
- the print format string for that column

For example, if :size-in-points is specified, the report-writers can know as a matter of course to call the field "size-in-pts", and also have access to the function which calculates the number of points in a given individual's program. Then the general structure for writing a report becomes something more like this, I guess?

(defn csv-stream-open
  "Creates and returns a new CSV stream for a given data store; raises error it if it exists already"
  [filename field-list]
  ;; open the stream
  ;; write the header strings based on the field-list
  ;; record the field-list in the returned object)

(defn csv-stream-append
  "Adds a line to a CSV stream for a given individual"
 [stream dude]
;; get the field list from the stream structure 
(spit csv-stream-filename
        ;; format taken from field list defined for this stream
        ;; list of field values
 :append true))

(defn csv-dump
  "Writes a pile of individuals to a given CSV stream"
  [stream collection]
  ;; obvious code here to call cvs-stream-append for each individual in the collection
)

If the generation is treated as a field in these reports rather than a higher-scale subdivision, then you also don't need to stage an entire "generation" at once, but can simple call something like cvs-dump periodically, for instance whenever a new individual is born.

Does that make sense? It will help make this code more maintainable and flexible in future when you decide you want different fields recorded, whether it's fewer or more.

Feb 22 '15 10:02 Vaguery

Aside: One of the main reasons people prefer JSON over CSV is that it doesn't impose this obligatory columnar structure you're wrestling with here. If you use the JSON format (or any key:value pair) for this sort of thing, it's much easier to change your mind, or to store heterogeneous "records" for each case. In a CSV framework, you've got to know before you start working what the columns are, and are forced to fill them all no matter what. Backwards-compatibility with earlier results is harder to cope with algorithmically as a result.

Feb 22 '15 10:02 Vaguery

Clojush Clojush copied to clipboard

Add Additional CSV Functionality

Clojush
Clojush copied to clipboard