libredwg
libredwg copied to clipboard
Issue with encoding of text in DICTIONARY from JSON to DWG
Example: first.dwg.gz
first.dwg:
codepage: 28 [RS 0]
ANSI_1250
...
Next object: 23 Handleoff: 0x1 [UMC] Offset: -133 [MC] @21132
==========================================
Object number: 23/17, Size: 74 [MS], Type: 42 [BS], Address: 21134
Add object DICTIONARY [23] Decode object DICTIONARY
handle: 0.1.1A [H 5]
EED[0] size: 20 [BS]
EED[0] handle: 5.1.12
EED[0] raw: 20
EED[0] code: 0 [RC], string: len=13 [RC] cp=28 [RS_BE] "TREAT_AS_HARD" [TF]
EED[1] code: 70 [RC], short: 0 [RS]
EED[2] size: 0 (end)
num_eed: 2
bitsize: 496 [RL]
Hdlsize: 0x60, hdl_dat: @62.0 - @74.0 (74)
num_reactors: 1 [BL 0]
ownerhandle: (4.1.C) abs:12 [H 330]
reactors[0]: (4.1.C) abs:12 [H* 330] => DICTIONARY
xdicobjhandle: (3.1.A2) abs:162 [H 360]
--common_size: 246
numitems: 3 [BL 0]
texts[0]: "MODEL" [TV 3]
texts[1]: "ROZVR.EN.1" [TV 3]
texts[2]: "ROZVR.EN.2" [TV 3]
itemhandles[0]: (2.1.22) abs:34 [H* 0]
itemhandles[1]: (2.1.1E) abs:30 [H* 0]
itemhandles[2]: (2.1.26) abs:38 [H* 0]
object_map{1A} = 23
crc: 7CCD [RSx]
check_CRC 21132-21208 = 76: 7CCD == 7CCD
After conversion from DWG to JSON in JSON:
{
"object": "DICTIONARY",
"index": 23,
"type": 42,
"handle": [0, 1, 26],
"size": 74,
"bitsize": 496,
"eed": [
{
"size": 20,
"handle": [5, 1, 18],
"code": 0,
"value": "TREAT_AS_HARD"
},
{
"code": 70,
"value": 0
}
],
"ownerhandle": [4, 1, 12, 12],
"reactors": [
[4, 1, 12, 12]
],
"xdicobjhandle": [3, 1, 162, 162],
"numitems": 3,
"items": {
"MODEL": [2, 1, 34, 34],
"ROZVRŽENÍ1": [2, 1, 30, 30],
"ROZVRŽENÍ2": [2, 1, 38, 38]
}
},
After conversion from JSON to DWG in DWG:
codepage: 28 [RS 0]
ANSI_1250
...
Next object: 23 Handleoff: 0x1 [UMC] Offset: 42 [MC] @18647
==========================================
Object number: 23/17, Size: 78 [MS], Type: 42 [BS], Address: 18649
Add object DICTIONARY [23] Decode object DICTIONARY
handle: 0.1.1A [H 5]
EED[0] size: 20 [BS]
EED[0] handle: 5.1.12
EED[0] raw: 20
EED[0] code: 0 [RC], string: len=13 [RC] cp=28 [RS_BE] "TREAT_AS_HARD" [TF]
EED[1] code: 70 [RC], short: 0 [RS]
EED[2] size: 0 (end)
num_eed: 2
bitsize: 528 [RL]
Hdlsize: 0x60, hdl_dat: @66.0 - @78.0 (78)
num_reactors: 1 [BL 0]
ownerhandle: (4.1.C) abs:12 [H 330]
reactors[0]: (4.1.C) abs:12 [H* 330] => DICTIONARY
xdicobjhandle: (3.1.A2) abs:162 [H 360]
--common_size: 246
numitems: 3 [BL 0]
texts[0]: "MODEL" [TV 3]
texts[1]: "ROZVRŽENÍ1" [TV 3]
texts[2]: "ROZVRŽENÍ2" [TV 3]
itemhandles[0]: (2.1.22) abs:34 [H* 0]
itemhandles[1]: (2.1.1E) abs:30 [H* 0]
itemhandles[2]: (2.1.26) abs:38 [H* 0]
object_map{1A} = 23
crc: B9A4 [RSx]
check_CRC 18647-18727 = 80: B9A4 == B9A4
Seems texts in final DWG file are UTF8, not ANSI_1250
not repro:
- out_json converts ROZVRŽENÍ1 (cp 1250) to ROZVRŽENÍ1 (utf-8)
- in_json converts ROZVRŽENÍ1 (utf-8) back to ROZVRŽENÍ1 (cp 1250), encode keeps it as ROZVRŽENÍ1 (cp 1250)
Interesting. I cleaned repository, compile all again. Tested disabling of locales. Result is same.
Artefacts: Makefile.gz
Results: first.json.gz second.dwg.gz second.json.gz
Makefile is for the process
@rurban Still not repro?
I tested on last master and results are same. In conversion from JSON to DWG there is UTF-8 string, not ANSI_1250.
Now repro. Orig: first_gh873.log:texts[1]: "ROZVR�EN�1" [TV 3] first_gh873.log:texts[2]: "ROZVR�EN�2" [TV 3] first_gh873.log:xdata[3]: "ROZVR�EN�1" [TV 1] first_gh873.log:xdata[5]: "ROZVR�EN�2" [TV 1]
recoded via json: first_gh873.dwg.log:texts[1]: "ROZVRŽENÍ1" [TV 3] first_gh873.dwg.log:texts[2]: "ROZVRŽENÍ2" [TV 3] first_gh873.dwg.log:xdata[3]: "ROZVRŽENÍ1" [TV 1] first_gh873.dwg.log:xdata[5]: "ROZVRŽENÍ2" [TV 1]
Great :-)
bit_write_TV ignores DWG_OPTS_INJSON and the target codepage. now texts[1]: "ROZVR�EN�1" [TV 3] texts[2]: "ROZVR�EN�2" [TV 3] xdata[3]: "ROZVRŽENÍ1" [TV 1] xdata[5]: "ROZVRŽENÍ2" [TV 1]
i.e. xdata still missing
And now: $ grep -a ROZ first_gh873.log texts[1]: "ROZVR�EN�1" [TV 3] texts[2]: "ROZVR�EN�2" [TV 3] xdata[3]: "ROZVR�EN�1" [TV 1] xdata[5]: "ROZVR�EN�2" [TV 1]
It's working, great, thank you.