miller icon indicating copy to clipboard operation
miller copied to clipboard

Update one .csv from another

Open spmundi opened this issue 3 years ago • 3 comments

This is a question, not a request for enhancement nor bug report.

I have a system in place which relies heavily on a .csv durable database. I frequently need to update this database with a subset of records and a subset of fields from a download. I am sure I can cobble something together with miller scripting, but in my mind's eye it seems complex. Is there any simple way to this this using primitives in miller (whether in 5.10 or 6.x)? Emphasis: all fields in the download are in the durable database but not the converse. Not all records are present in a new download. I have no need to create entirely new records.

Thx Rob Stevens

spmundi avatar Jan 02 '22 17:01 spmundi

Sorry, John, I had not learnt how properly to use Github. I discovered, and had never before realised it, that non-key fields from any other files on the command line overwrite those in the so-called LHS file. But if the update file has null fields (ie two consecutive delimiters) then that null value also becomes the new value. I don't want that. The script I have tried to use (with debugging stetments) is the following but it doesn't work and I am not sure why. I fear I have not read the documentation with sufficient care.

for(k,v in $*) {
    print k;
    print v;
    if (k =~ "^r_.*$") { continue; }
    rf="r_" . k;
    print rf;
    print $r_fa;
    print $[rf];
    if (is_null($[rf])) { continue; }
    $k=$[rf];
}

Edits since original posting:

Addendum: I should have said that before running this script I had joined the file containing the updates to the original and used --rp 'r_' What I am trying to do is to find the field in the updater file corresponding to a field in the original file, and if that field is non null then perform the update. But I cannot seem to figure out how to reference the field in the update. I have tried $["rf"] and $[rf] neither seems to work.

spmundi avatar Jan 02 '22 20:01 spmundi

Well I have now "solved" my problem. It was not with the script. My understanding of the join verb was that if I have two files that I wish to join, and that the field on which I wish to join them is named differently, say keyleft and keyright, then the command line would look something like: mlr join -l keyleft -r keyright -f leftfile rightfile This is apparently not so, and in fact mlr reports an error on the command line

I am confused by the description of join. I think some clarification in the documentation might help.

spmundi avatar Jan 03 '22 18:01 spmundi

@spmundi sorry for the delay in getting back to you on this!

I (think I) totally get the update-in-place use-case but I don't see how to directly support it in Miller.

Regarding the join docs, let me take a closer look, go through your code examples (thank you!!) and try to make this more clear.

johnkerl avatar Jan 16 '22 21:01 johnkerl