NextGen-IFC icon indicating copy to clipboard operation
NextGen-IFC copied to clipboard

UFT8 encoding for IFC serialisations

Open berlotti opened this issue 5 years ago • 5 comments

Currently STEP serialized IFC requires string encoding according to ISO 8859-1. (more info on https://technical.buildingsmart.org/resources/ifcimplementationguidance/string-encoding/) The latest STEP ISO standard has the ability to use UTF8 for encoding, which is widely adopted and the defacto standard.

I suggest to use UTF8 encoding for all serializations of IFC.

berlotti avatar Feb 25 '20 10:02 berlotti

yes please!

janbrouwer avatar Mar 04 '20 19:03 janbrouwer

That... seems like common sense? What are the effects of the change? Are there any?

pipauwel avatar Mar 04 '20 22:03 pipauwel

Effects that I can think of:

  • Files will be larger
  • Files will be more human-readable
  • Parsers don't have to do all the strange character replacements on all text-based attributes(so easier to implement and faster processing)

On Wed, Mar 4, 2020, 23:30 Pieter Pauwels [email protected] wrote:

That... seems like common sense? What are the effects of the change? Are there any?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/buildingSMART/NextGen-IFC/issues/7?email_source=notifications&email_token=ABCZVLFUNLZZQJI27LQ6PXLRF3JAVA5CNFSM4K3EPVTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEN2WTZQ#issuecomment-594897382, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCZVLGBVEOPTKU2U55IHXTRF3JAVANCNFSM4K3EPVTA .

janbrouwer avatar Mar 05 '20 09:03 janbrouwer

There needs to be an investigation about the increase of file size by using UTF-8, compared with ISO 8859-1 before making a decision. Usually (in typical IFC2x3 CV or IFC4 RV file) 98% of the text is coming from the ISO 8859-1 code tables (e.g. all geometry).

And file size does matter! Today practitioners are stuck with IFC files >500MB (e.g. for MEP models) and partial/transactional exchange cannot solve all exchange scenarios.

another observation - I would assume, that complete file-based exchange will best be served by sticking to STEP physical file, whereas other transactions are better served by using ifcXML, ifcJson, etc. There (in partial transactions) file sizes are not a problem. And in XML / Json UTF-8 is already supported.

TLiebich avatar Mar 10 '20 22:03 TLiebich

When adopting 2016 version of STEP this is according to the standard. Additional restrictions when using IFC: ONLY use UTF8 (exclude older ones)

berlotti avatar Mar 13 '20 07:03 berlotti