libredwg icon indicating copy to clipboard operation
libredwg copied to clipboard

Issue with encoding of text in DICTIONARY from JSON to DWG

Open michal-josef-spacek opened this issue 1 year ago • 9 comments

Example: first.dwg.gz

first.dwg:

codepage: 28 [RS 0]
ANSI_1250
...
Next object: 23 Handleoff: 0x1 [UMC] Offset: -133 [MC] @21132
==========================================
Object number: 23/17, Size: 74 [MS], Type: 42 [BS], Address: 21134
Add object DICTIONARY [23] Decode object DICTIONARY
handle: 0.1.1A [H 5]
EED[0] size: 20 [BS]
EED[0] handle: 5.1.12
EED[0] raw: 20
EED[0] code: 0 [RC], string: len=13 [RC] cp=28 [RS_BE] "TREAT_AS_HARD" [TF]
EED[1] code: 70 [RC], short: 0 [RS]

EED[2] size: 0 (end)
num_eed: 2
bitsize: 496 [RL]
 Hdlsize: 0x60, hdl_dat: @62.0 - @74.0 (74)
num_reactors: 1 [BL 0]
ownerhandle: (4.1.C) abs:12 [H 330]
reactors[0]: (4.1.C) abs:12 [H* 330] => DICTIONARY
xdicobjhandle: (3.1.A2) abs:162 [H 360]
--common_size: 246
numitems: 3 [BL 0]
texts[0]: "MODEL" [TV 3]
texts[1]: "ROZVR.EN.1" [TV 3]
texts[2]: "ROZVR.EN.2" [TV 3]
itemhandles[0]: (2.1.22) abs:34 [H* 0]
itemhandles[1]: (2.1.1E) abs:30 [H* 0]
itemhandles[2]: (2.1.26) abs:38 [H* 0]
 object_map{1A} = 23
crc: 7CCD [RSx]
 check_CRC 21132-21208 = 76: 7CCD == 7CCD

After conversion from DWG to JSON in JSON:

    {
      "object": "DICTIONARY",
      "index": 23,
      "type": 42,
      "handle": [0, 1, 26],
      "size": 74,
      "bitsize": 496,
      "eed": [
        {
          "size": 20,
          "handle": [5, 1, 18],
          "code": 0,
          "value": "TREAT_AS_HARD"
        },
        {
          "code": 70,
          "value": 0
        }
      ],
      "ownerhandle": [4, 1, 12, 12],
      "reactors": [
        [4, 1, 12, 12]
      ],
      "xdicobjhandle": [3, 1, 162, 162],
      "numitems": 3,
      "items": {
        "MODEL": [2, 1, 34, 34],
        "ROZVRŽENÍ1": [2, 1, 30, 30],
        "ROZVRŽENÍ2": [2, 1, 38, 38]
      }
    },

After conversion from JSON to DWG in DWG:

codepage: 28 [RS 0]
ANSI_1250
...
Next object: 23 Handleoff: 0x1 [UMC] Offset: 42 [MC] @18647
==========================================
Object number: 23/17, Size: 78 [MS], Type: 42 [BS], Address: 18649
Add object DICTIONARY [23] Decode object DICTIONARY
handle: 0.1.1A [H 5]
EED[0] size: 20 [BS]
EED[0] handle: 5.1.12
EED[0] raw: 20
EED[0] code: 0 [RC], string: len=13 [RC] cp=28 [RS_BE] "TREAT_AS_HARD" [TF]
EED[1] code: 70 [RC], short: 0 [RS]

EED[2] size: 0 (end)
num_eed: 2
bitsize: 528 [RL]
 Hdlsize: 0x60, hdl_dat: @66.0 - @78.0 (78)
num_reactors: 1 [BL 0]
ownerhandle: (4.1.C) abs:12 [H 330]
reactors[0]: (4.1.C) abs:12 [H* 330] => DICTIONARY
xdicobjhandle: (3.1.A2) abs:162 [H 360]
--common_size: 246
numitems: 3 [BL 0]
texts[0]: "MODEL" [TV 3]
texts[1]: "ROZVRŽENÍ1" [TV 3]
texts[2]: "ROZVRŽENÍ2" [TV 3]
itemhandles[0]: (2.1.22) abs:34 [H* 0]
itemhandles[1]: (2.1.1E) abs:30 [H* 0]
itemhandles[2]: (2.1.26) abs:38 [H* 0]
 object_map{1A} = 23
crc: B9A4 [RSx]
 check_CRC 18647-18727 = 80: B9A4 == B9A4

Seems texts in final DWG file are UTF8, not ANSI_1250

michal-josef-spacek avatar Nov 24 '23 19:11 michal-josef-spacek

not repro:

  • out_json converts ROZVRŽENÍ1 (cp 1250) to ROZVRŽENÍ1 (utf-8)
  • in_json converts ROZVRŽENÍ1 (utf-8) back to ROZVRŽENÍ1 (cp 1250), encode keeps it as ROZVRŽENÍ1 (cp 1250)

rurban avatar Nov 27 '23 13:11 rurban

Interesting. I cleaned repository, compile all again. Tested disabling of locales. Result is same.

Artefacts: Makefile.gz

Results: first.json.gz second.dwg.gz second.json.gz

Makefile is for the process

michal-josef-spacek avatar Nov 27 '23 16:11 michal-josef-spacek

@rurban Still not repro?

michal-josef-spacek avatar Dec 06 '23 19:12 michal-josef-spacek

I tested on last master and results are same. In conversion from JSON to DWG there is UTF-8 string, not ANSI_1250.

michal-josef-spacek avatar Mar 18 '24 12:03 michal-josef-spacek

Now repro. Orig: first_gh873.log:texts[1]: "ROZVR�EN�1" [TV 3] first_gh873.log:texts[2]: "ROZVR�EN�2" [TV 3] first_gh873.log:xdata[3]: "ROZVR�EN�1" [TV 1] first_gh873.log:xdata[5]: "ROZVR�EN�2" [TV 1]

recoded via json: first_gh873.dwg.log:texts[1]: "ROZVRŽENÍ1" [TV 3] first_gh873.dwg.log:texts[2]: "ROZVRŽENÍ2" [TV 3] first_gh873.dwg.log:xdata[3]: "ROZVRŽENÍ1" [TV 1] first_gh873.dwg.log:xdata[5]: "ROZVRŽENÍ2" [TV 1]

rurban avatar Mar 19 '24 14:03 rurban

Great :-)

michal-josef-spacek avatar Mar 19 '24 18:03 michal-josef-spacek

bit_write_TV ignores DWG_OPTS_INJSON and the target codepage. now texts[1]: "ROZVR�EN�1" [TV 3] texts[2]: "ROZVR�EN�2" [TV 3] xdata[3]: "ROZVRŽENÍ1" [TV 1] xdata[5]: "ROZVRŽENÍ2" [TV 1]

i.e. xdata still missing

rurban avatar Mar 19 '24 21:03 rurban

And now: $ grep -a ROZ first_gh873.log texts[1]: "ROZVR�EN�1" [TV 3] texts[2]: "ROZVR�EN�2" [TV 3] xdata[3]: "ROZVR�EN�1" [TV 1] xdata[5]: "ROZVR�EN�2" [TV 1]

rurban avatar Mar 19 '24 22:03 rurban

It's working, great, thank you.

michal-josef-spacek avatar Mar 19 '24 22:03 michal-josef-spacek