jq icon indicating copy to clipboard operation
jq copied to clipboard

Merging three JSON files with two different keys

Open gvp-devops opened this issue 2 years ago • 6 comments

Have been perplexed to getting my head around merging three JSON files by two different keys in order to provide one complete JSON file without removing duplicate keys. My thought process is this needs to be done by two different jq commands.

  1. The initial run processes successfully because the ID is unique.
jq -s 'flatten | group_by(.id) | map(reduce .[] as $x ({}; . * $x))' File1 File2 > /tmp/File_temp.json

However when I run the same command but change id to project_id, that almost works but what happens is when there are multiple tenants owners that own multiple ID instances, the final result remove all ID and project_id key duplicates and only provides one result per tenant owner. I know it has to do with the map(reduce... filter. The end result is not to remove any duplicate values since a single tenant can own multiple instance IDs.

Have scrubbed multiple JQ docs and StackOverflow but no reference to merge three JSON files, and what options are preferable to use map filter.

  1. Second run to merge tenant owner names into complete master JSON file3.
jq -s 'flatten | group_by(.project_id) | map(reduce .[] as $x ({}; . * $x))' File_temp File3 > /tmp/master_inventory.json

NOTE: In reference to JSON format, this is the JSON output format when using Openstack CLI

Instance File 1

{
  "ID"	 "74a7cb34-1510-4cb6-aaa5-3e97ef13d89e", <<<<<<<<< ID Key to reference File2 ID 
  "Name"	 "bgf01_vnf_stack-0",
  "Status"	 "ACTIVE",
  "Task State"	 null,
  "Power State"	 "Running",
  "Networks"	 "bgf01_vnf_stack_internal=xxx.xxx.xxx.xxx; vbgf-v4-green-dynamic-1=xxx.xxx.xxx;
  "Image Name"	 "vbgf01_28",
  "Image ID"	 "54c9fc86-3331-4e45-972b-67fe3e0fba3c",
  "Flavor Name"	 "d1.vbgf01_flavor",
  "Flavor ID"	 "1aaed288-7867-4880-b4b2-7c0f8038ad86",
  "Availability Zone"	 "nova",
  "Host"	 "ospcomputed2-ctc-a610d",
  "Properties"	 "ha-policy='ha-offline'"
}

Project ID File 2

  {
    "id": "74a7cb34-1510-4cb6-aaa5-3e97ef13d89e", <<<<<<<<< ID Key to reference Project_ID Key to File3
    "project_id": "116408287003490c87817885841753c3"
  }

Tenant Owner File 3

  {
    "project_id": "116408287003490c87817885841753c3", <<<<<<<< Project_ID Key reference to obtain tenant/instance Owner
    "tenant": "ims-vbgf-dev"
  }

Desired final merged JSON

{
  "ID"	 "74a7cb34-1510-4cb6-aaa5-3e97ef13d89e",
  "Name"	 "bgf01_vnf_stack-0",
  "Status"	 "ACTIVE",
  "Task State"	 null,
  "Power State"	 "Running",
  "Networks"	 "bgf01_vnf_stack_internal=xxx.xxx.xxx.xxx; vbgf-v4-green-dynamic-1=xxx.xxx.xxx.xxx
  "Image Name"	 "vbgf01_28",
  "Image ID"	 "54c9fc86-3331-4e45-972b-67fe3e0fba3c",
  "Flavor Name"	 "d1.vbgf01_flavor",
  "Flavor ID"	 "1aaed288-7867-4880-b4b2-7c0f8038ad86",
  "Availability Zone"	 "nova",
  "Host"	 "ospcomputed2-ctc-a610d",
  "Properties"	 "ha-policy='ha-offline'"
  "project_id": "116408287003490c87817885841753c3", <<<<< Merged from File 3 into final JSON
   "tenant": "ims-vbgf-dev".                                                    <<<<< Merged from File 3 into final JSON
}

gvp-devops avatar Jun 21 '22 17:06 gvp-devops

Presuming you have multiple objects per file. Otherwise you could just smash everything together:

jq -n 'reduce inputs as $i ({}; . += $i)' File[123]

To cross-reference multiple objects across all your files, you might use an index object where the index is "renamed" (copied and deleted) during the second migration. This disadvantage here is memory footprint.

jq --null-input '
	reduce inputs as $i
	( {}
	; if $i.ID then                     # form 1
		.[$i.ID] = $i
	elif $i.id and $i.project_id then   # form 2
		.[$i.project_id] = .[$i.id] + ($i | {project_id})
		| del(.[$i.id])
	elif $i.tenant then                 # form 3
		.[$i.project_id] += ($i | {tenant})
	else . end
	)
	| .[]   # break down index object into stream of desired objects
' File[123]

vintnes avatar Jun 21 '22 20:06 vintnes

@vintnes Thank you for this and so much appreciated. I think I can see where you're going with these. In testing the first to just merge it all together I receive the following error (which has been common through out all jq runs). One thing I did not think to ask is the format of each JSON file. Each one is formatted as an array with brackets at the beginning and end of each file.

$ jq -n 'reduce inputs as $i ({}; . += $i)' ctc_osp_vmmaster_lc.json (file 1) ctc_osp_projectIDs.json (file 2) ctc_osp_nameid.json (file 3)
jq: error (at ctc_osp_vmmaster_lc.json:7202): object ({}) and array ([{"id":"9a1...) cannot be added

Test run after second migration

$ jq --null-input 'reduce inputs as $i ({}; if $i.id then .[$i.id] = $i elif $i.id and $i.project_id then .[$i.project_id] = .[$i.id] + ($i | {project_id}) | del(.[$i.id]) elif $i.tenant then .[$i.project_id]
+= ($i | {tenant}) else . end) | .[]' master1.json (temp file) ctc_osp_nameid.json (file 3)
jq: error (at master1.json:886): Cannot index array with string "id"

Second Migration Run

Result of cross-referencing using second jq code sample almost produced result. Rather it's merging two files together but not merging the tenant key with the actual identity record, as $ jq -s 'flatten | group_by(.project_id) | map(reduce .[] as $x ({}; . * $x))' File_temp File3 has done.

I believe what really need to know are the map options to not remove duplicate key values in order to maintain the inventory integrity.

(...)
{
  "id": "ffb801a4-93d9-4df7-82e6-5341f995e113",
  "name": "echoserver-workflow-test-b-4",
  "status": "active",
  "task state": null,
  "power state": "running",
  "networks": "green-dynamic=2001:558:1047:9d8:f816:3eff:fe0f:515b, xx.xxx.xxx.xxx",
  "image name": "centos-7-mvp_echoserver_v1.0.7_2022-04-21 03:08:35",
  "image id": "bb27e36a-3461-450f-af83-2bbe4531930d",
  "flavor name": "m1.small",
  "flavor id": "f874ca8b-c4fa-4dde-9fb6-3b22b3a2b0b2",
  "availability zone": "nova",
  "host": "ospcomputem-ctc-a311d",
  "properties": "group='echoservice-workflow-test'",
  "project_id": "21d74337771a43e2b0ac5a91db828898"
}
{
  "tenant": "wsv-gcs-qa"
}
(...)

gvp-devops avatar Jun 22 '22 13:06 gvp-devops

inputs[]

vintnes avatar Jun 22 '22 17:06 vintnes

@vintnes Spent some time to work on this and unable for jq to merge three JSON file arrays without having to use unique feature and remove all duplicate project_id records. As a result removes several inventory records. Each iteration of the suggested commands is peculiar when working with them as they produce two different results.

  • [ ] Uncertain why only produced one result versus loop through all records. Additionally this peculiar record is not at beginning or end of files. So trying to understand why it would do that.
$ jq -n 'reduce inputs[] as $i ({}; . += $i)' ctc_osp_vmmaster_lc.json (file 1) ctc_osp_projectIDs.json (file 2) ctc_osp_nameid.json (file 3)
{
  "id": "74a7cb34-1510-4cb6-aaa5-3e97ef13d89e",
  "name": "bgf01_vnf_stack-0",
  "status": "active",
  "task state": null,
  "power state": "running",
  "networks": "bgf01_vnf_stack_internal=xxx.xxx.0.10,
  "image name": "vbgf01_28",
  "image id": "54c9fc86-3331-4e45-972b-67fe3e0fba3c",
  "flavor name": "d1.vbgf01_flavor",
  "flavor id": "1aaed288-7867-4880-b4b2-7c0f8038ad86",
  "availability zone": "nova",
  "host": "ospcomputed2-ctc-a610d",
  "properties": "ha-policy='ha-offline'",
  "project_id": "116408287003490c87817885841753c3",
  "tenant": "ims-vbgf-dev"
}
  • [ ] The second run omitted all other keys from previous jq run. Although it did "merge" the latter two files it did not merge as a unique record index.
$ jq --null-input 'reduce inputs[] as $i( {}; if $i.id then .[$i.id] = $i elif $i.id and $i.project_id then .[$i.project_id] = .[$i.id] + ($i | {project_
id}) elif $i.tenant then .[$i.project_id] += ($i | {tenant}) else . end) | .[]' ctc_osp_vmmaster_lc.json ctc_osp_projectIDs.json ctc_osp_nameid.json
(...)
{
  "id": "9a120e32-b001-432a-a1ae-59a297969126",
  "project_id": "ed1a52f2e5664c4f90806214bcb98f58"
}
{
  "id": "1eebdef1-8fd8-424b-a15f-f98499a27fff",
  "project_id": "923d8bca286d470592d467ca8a130fba"
}
{
  "id": "9a9ea330-4af9-4d38-8c5e-7c505957b962",
  "project_id": "923d8bca286d470592d467ca8a130fba"
{
  "tenant": "wsv-gcs-qa"
}
{
  "tenant": "SDWAN-DSE-VIVA-CTC-C1"
}
{
  "tenant": "SDWAN-DSE-VIVA-A1"
(...)

gvp-devops avatar Jun 23 '22 18:06 gvp-devops

Wow.

Your initial description of your source data was inadequate. The first solution (the simple reduce) was, as I explicitly stated, only for a situation with one object per file.

Your demo of the second solution (the conditional index object) is not the code I proposed. It looks like you tried to rewrite my solution by hand, but you lost some case sensitivity (.ID vs .id) and the del() statement.

vintnes avatar Jun 23 '22 19:06 vintnes

Wow.

Your initial description of your source data was inadequate. The first solution (the simple reduce) was, as I explicitly stated, only for a situation with one object per file.

Your demo of the second solution (the conditional index object) is not the code I proposed. It looks like you tried to rewrite my solution by hand, but you lost some case sensitivity (.ID vs .id) and the del() statement.

You're correct and apologize for the rewrite. I did use the code as originally provided however, there was one file I used where the keys were all lowercase thinking that may have been an issue based on one "comment" I was provided elsewhere.

gvp-devops avatar Jun 23 '22 19:06 gvp-devops