Update one .csv from another
This is a question, not a request for enhancement nor bug report.
I have a system in place which relies heavily on a .csv durable database. I frequently need to update this database with a subset of records and a subset of fields from a download. I am sure I can cobble something together with miller scripting, but in my mind's eye it seems complex. Is there any simple way to this this using primitives in miller (whether in 5.10 or 6.x)? Emphasis: all fields in the download are in the durable database but not the converse. Not all records are present in a new download. I have no need to create entirely new records.
Thx Rob Stevens
Sorry, John, I had not learnt how properly to use Github. I discovered, and had never before realised it, that non-key fields from any other files on the command line overwrite those in the so-called LHS file. But if the update file has null fields (ie two consecutive delimiters) then that null value also becomes the new value. I don't want that. The script I have tried to use (with debugging stetments) is the following but it doesn't work and I am not sure why. I fear I have not read the documentation with sufficient care.
for(k,v in $*) {
print k;
print v;
if (k =~ "^r_.*$") { continue; }
rf="r_" . k;
print rf;
print $r_fa;
print $[rf];
if (is_null($[rf])) { continue; }
$k=$[rf];
}
Edits since original posting:
Addendum: I should have said that before running this script I had joined the file containing the updates to the original and used --rp 'r_' What I am trying to do is to find the field in the updater file corresponding to a field in the original file, and if that field is non null then perform the update. But I cannot seem to figure out how to reference the field in the update. I have tried $["rf"] and $[rf] neither seems to work.
Well I have now "solved" my problem. It was not with the script. My understanding of the join verb was that if I have two files that I wish to join, and that the field on which I wish to join them is named differently, say keyleft and keyright, then the command line would look something like: mlr join -l keyleft -r keyright -f leftfile rightfile This is apparently not so, and in fact mlr reports an error on the command line
I am confused by the description of join. I think some clarification in the documentation might help.
@spmundi sorry for the delay in getting back to you on this!
I (think I) totally get the update-in-place use-case but I don't see how to directly support it in Miller.
Regarding the join docs, let me take a closer look, go through your code examples (thank you!!) and try to make this more clear.