Unicode support

<po>This article provides details on Unicode support in Python.

 

</po>

Operating mode

Cloud Suite

|

ON-PREMISES

Modules

Services & CRM

Budget & Phases

Purchases

Resource Planning

Business Intelligence

Created: 08.09.2021
Machine translated
Updated: 19.10.2021 | Article reformulated

Texts are stored in Unicode in Vertec (since Vertec 6.5). Accordingly, Vertec supports all approximately 65,000 symbols of the BMP (“Basic Multilingual Plane”) and thus the symbols of all relevant languages, including all Asian languages. Symbols outside the BMP and entered in Vertec are replaced by a ? (which is likely to be the case for only a few emojis).

Python and Unicode

The default encoding in Python is ANSI. The transfer of a Unicode string to the default encoding, i.e. from Unicode to ANSI, is designed to be fault-tolerant: Character that cannot be converted do not result in an error, but are replaced by ?.

str Module is also “bent” to Unicode in Vertec because an analysis of existing Python code at customers has shown that there are many uses of str() even for data that are already strings and which would generate errors without this correction.

If strings are processed and, for example, saved in a file or sent to a web service, a Unicode encoding (such as UTF-8) must be selected so that the other party can read them correctly. Without explicit encoding such as  string.encode("UTF-8") implicit conversion to ANSI with the above-mentioned data loss of non-ANSI symbols is applied as ?.

Also when reading data into Vertec, whether by opening a text file, using the Vertec xml interface or using vtcapp.requestfilefromclient(), you should take care of the encoding, because data loss of non-ANSI characters is unnecessary due to the symbol Unicode support. So if you read a text file encoded in UTF-8, for example, it must be decoded correctly:

unicodestring = filecontent.decode("UTF-8")

As a rule of thumb, note that in Python

  • encode() always converts from a Unicode string to a byte stream
  • decode() always converts from a byte stream (e.g. UTF-8) to Unicode

Netherlands

United Kingdom