NextGen-IFC
NextGen-IFC copied to clipboard
UFT8 encoding for IFC serialisations
Currently STEP serialized IFC requires string encoding according to ISO 8859-1. (more info on https://technical.buildingsmart.org/resources/ifcimplementationguidance/string-encoding/) The latest STEP ISO standard has the ability to use UTF8 for encoding, which is widely adopted and the defacto standard.
I suggest to use UTF8 encoding for all serializations of IFC.
yes please!
That... seems like common sense? What are the effects of the change? Are there any?
Effects that I can think of:
- Files will be larger
- Files will be more human-readable
- Parsers don't have to do all the strange character replacements on all text-based attributes(so easier to implement and faster processing)
On Wed, Mar 4, 2020, 23:30 Pieter Pauwels [email protected] wrote:
That... seems like common sense? What are the effects of the change? Are there any?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/buildingSMART/NextGen-IFC/issues/7?email_source=notifications&email_token=ABCZVLFUNLZZQJI27LQ6PXLRF3JAVA5CNFSM4K3EPVTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEN2WTZQ#issuecomment-594897382, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCZVLGBVEOPTKU2U55IHXTRF3JAVANCNFSM4K3EPVTA .
There needs to be an investigation about the increase of file size by using UTF-8, compared with ISO 8859-1 before making a decision. Usually (in typical IFC2x3 CV or IFC4 RV file) 98% of the text is coming from the ISO 8859-1 code tables (e.g. all geometry).
And file size does matter! Today practitioners are stuck with IFC files >500MB (e.g. for MEP models) and partial/transactional exchange cannot solve all exchange scenarios.
another observation - I would assume, that complete file-based exchange will best be served by sticking to STEP physical file, whereas other transactions are better served by using ifcXML, ifcJson, etc. There (in partial transactions) file sizes are not a problem. And in XML / Json UTF-8 is already supported.
When adopting 2016 version of STEP this is according to the standard. Additional restrictions when using IFC: ONLY use UTF8 (exclude older ones)