ifcplusplus icon indicating copy to clipboard operation
ifcplusplus copied to clipboard

IFC 4X3 (Release 1.3) does NOT support unicode values anymore!

Open alas2 opened this issue 2 years ago • 8 comments

std::wstring in (IFC 4X1) was replaced by std::string in (IFC 4X3 and master branch). If we try to read the attached file that has unicode product name will have incorrect name value.

#1253= IFCREINFORCINGBAR('3I3yJtn4rEqwqDeBuj90D4',#41,'\X2\92447B4B68D2\X0:D22 : \X2\5F6272B6\X0\ \X2\92447B4B5F6272B6\X0\ 1:191888: 1',$,'\X2\92447B4B68D2\X0:D22:93672',#1180,#1251,'191888',$,22.,0.000380132711084365,11700.,.NOTDEFINED.,$);

UniCodeContent.zip

alas2 avatar Feb 08 '23 03:02 alas2

The thing is that std::wstring does not work on Linux, and it needs to run on some server side Linux machines. On Windows, with the right project settings, std::string and Unicode should work fine.

ifcquery avatar Feb 08 '23 10:02 ifcquery

Okay. Thanks.

I had a look at the decodeArgumentStrings(), seems it does not decode string correctly.

Look for instance at line 813 in (In ReaderUtil.cpp), 1 byte here is always missing

char c = Hex4Char(*(stream_pos), *(stream_pos+1), *(stream_pos+2), *(stream_pos+3));

alas2 avatar Feb 09 '23 01:02 alas2

存储后的ifc文件 用bimvision 有中文会乱码,这个怎么破

#236= IFCPROPERTYSET('1EvIHrDHjEGRq_m5cA1C2N',$,'abc\X2\FFBFFFCDFFBBFFA7FFD0FFD5FFC3FFFB\X0\1111','\X2\FFBBFFF9FFB1FFBE\X0',(#237,#238,#239,#240)); 不知道生成的有问题没,用ifc++读取出来是正常的,但是其它可以读取ifc文件的软件都不正常。。。估计要自己修改 iso10646的编码方式了 ifc.zip

wangafei avatar Feb 15 '23 09:02 wangafei

The thing is that std::wstring does not work on Linux

This is incorrect. It has different encoding (UTF-32 instead of UTF-16), but it absolutely does work.

Osyotr avatar May 18 '23 11:05 Osyotr

I solved it by modifying code as below.. i think this is a bug. fix

Branden-Shin avatar Jun 09 '23 04:06 Branden-Shin

Thanks for the suggestion! It is Windows only, I guess?

ifcquery avatar Jun 11 '23 16:06 ifcquery

std::wstring works sometimes on Linux, but only under certain circumstances that I didn't reliably figure out. And it is not necessary, since std::string and unicode works fine on Linux, at least I didn't have problems yet.

ifcquery avatar Jun 11 '23 16:06 ifcquery

The problem with std::string is that on Windows it's hard to tell what encoding the data is in. It may be UTF-8, but may very well be CP_ACP that depends on locale. If you're going to stick with std::string make sure that the data it holds is in UTF-8 and it is used consistently, i.e. all instances of std::string hold UTF-8 data. When you read data from file that is not in UTF-8 you should read it into some buffer (aka std::unique_ptr<char[]>) and then convert it to UTF-8. There's really no difference between std::string and std::wstring when used correctly. What you should never do though is:

  1. Construct std::wstring objects by copying data from std::string. The conversion is required in this case (either UTF-8 to UTF-16/UTF-32 or ).
  2. When using conversion function like mbstowcs don't assume wide string is twice as large as multibyte string. Always call that function twice to obtain destination size.

Osyotr avatar Jun 11 '23 18:06 Osyotr

This seems to be fixed in version (2.3) IFC4X3

alas2 avatar Apr 30 '24 03:04 alas2

This seems to be fixed in version (2.3) IFC4X3

alas2 avatar Apr 30 '24 03:04 alas2