Version: 3.3.0
wxConvAuto Class Reference

#include <wx/convauto.h>

+ Inheritance diagram for wxConvAuto:

Detailed Description

This class implements a Unicode to/from multibyte converter capable of automatically recognizing the encoding of the multibyte text on input.

The logic used is very simple: the class uses the BOM (byte order mark) if it's present and tries to interpret the input as UTF-8 otherwise. If this fails, the input is interpreted as being in the default multibyte encoding which can be specified in the constructor of a wxConvAuto instance and, in turn, defaults to the value of GetFallbackEncoding() if not explicitly given.

For the conversion from Unicode to multibyte, the same encoding as was previously used for multibyte to Unicode conversion is reused. If there had been no previous multibyte to Unicode conversion, UTF-8 is used by default. Notice that once the multibyte encoding is automatically detected, it doesn't change any more, i.e. it is entirely determined by the first use of wxConvAuto object in the multibyte-to-Unicode direction. However creating a copy of wxConvAuto object, either via the usual copy constructor or assignment operator, or using wxMBConv::Clone(), resets the automatically detected encoding so that the new copy will try to detect the encoding of the input on first use.

This class is used by default in wxWidgets classes and functions reading text from files such as wxFile, wxFFile, wxTextFile, wxFileConfig and various stream classes so the encoding set with its SetFallbackEncoding() method will affect how these classes treat input files. In particular, use this method to change the fall-back multibyte encoding used to interpret the contents of the files whose contents isn't valid UTF-8 or to disallow it completely.

Library:  wxBase
Category:  Data Structures
See also
wxMBConv Overview

Public Member Functions

 wxConvAuto (wxFontEncoding enc=wxFONTENCODING_DEFAULT)
 Constructs a new wxConvAuto instance. More...
 
wxBOM GetBOM () const
 Return the detected BOM type. More...
 
wxBOM GetEncoding () const
 Return the detected encoding. More...
 
bool IsUsingFallbackEncoding () const
 Check if the fall-back encoding is used. More...
 
const char * GetBOMChars (wxBOM bom, size_t *count)
 Return a pointer to the characters that makes up this BOM. More...
 
- Public Member Functions inherited from wxMBConv
 wxMBConv ()
 Trivial default constructor. More...
 
virtual wxMBConvClone () const =0
 This pure virtual function is overridden in each of the derived classes to return a new copy of the object it is called on. More...
 
virtual size_t GetMaxCharLen () const
 This function must be overridden in the derived classes to return the maximum length, in bytes, of a single Unicode character representation in this encoding. More...
 
virtual size_t GetMBNulLen () const
 This function returns 1 for most of the multibyte encodings in which the string is terminated by a single NUL, 2 for UTF-16 and 4 for UTF-32 for which the string is terminated with 2 and 4 NUL characters respectively. More...
 
virtual bool IsUTF8 () const
 Return true if the converter's charset is UTF-8. More...
 
virtual size_t ToWChar (wchar_t *dst, size_t dstLen, const char *src, size_t srcLen=wxNO_LEN) const
 Convert multibyte string to a wide character one. More...
 
virtual size_t FromWChar (char *dst, size_t dstLen, const wchar_t *src, size_t srcLen=wxNO_LEN) const
 Converts wide character string to multibyte. More...
 
wxWCharBuffer cMB2WC (const char *in, size_t inLen, size_t *outLen) const
 Converts from multibyte encoding to Unicode by calling ToWChar() and allocating a temporary wxWCharBuffer to hold the result. More...
 
wxWCharBuffer cMB2WC (const wxCharBuffer &buf) const
 Converts a char buffer to wide char one. More...
 
wxWCharBuffer cMB2WX (const char *psz) const
 Converts from multibyte encoding to wchar_t. More...
 
wxCharBuffer cWC2MB (const wchar_t *in, size_t inLen, size_t *outLen) const
 Converts from Unicode to multibyte encoding by calling FromWChar() and allocating a temporary wxCharBuffer to hold the result. More...
 
wxCharBuffer cWC2MB (const wxWCharBuffer &buf) const
 Converts a wide char buffer to char one. More...
 
virtual size_t MB2WC (wchar_t *out, const char *in, size_t outLen) const
 
virtual size_t WC2MB (char *buf, const wchar_t *psz, size_t n) const
 
const wchar_t * cWC2WX (const wchar_t *psz) const
 Converts from Unicode to the current wxChar type. More...
 
wxCharBuffer cWC2WX (const wchar_t *psz) const
 Converts from Unicode to the current wxChar type. More...
 
const char * cWX2MB (const wxChar *psz) const
 Converts from the current wxChar type to multibyte encoding. More...
 
wxCharBuffer cWX2MB (const wxChar *psz) const
 Converts from the current wxChar type to multibyte encoding. More...
 
const wchar_t * cWX2WC (const wxChar *psz) const
 Converts from the current wxChar type to Unicode. More...
 
wxWCharBuffer cWX2WC (const wxChar *psz) const
 Converts from the current wxChar type to Unicode. More...
 

Static Public Member Functions

static void DisableFallbackEncoding ()
 Disable the use of the fall back encoding: if the input doesn't have a BOM and is not valid UTF-8, the conversion will fail. More...
 
static wxFontEncoding GetFallbackEncoding ()
 Returns the encoding used by default by wxConvAuto if no other encoding is explicitly specified in constructor. More...
 
static void SetFallbackEncoding (wxFontEncoding enc)
 Changes the encoding used by default by wxConvAuto if no other encoding is explicitly specified in constructor. More...
 
static wxBOM DetectBOM (const char *src, size_t srcLen)
 Return the BOM type of this buffer. More...
 
- Static Public Member Functions inherited from wxMBConv
static size_t GetMaxMBNulLen ()
 Returns the maximal value which can be returned by GetMBNulLen() for any conversion object. More...
 

Constructor & Destructor Documentation

◆ wxConvAuto()

wxConvAuto::wxConvAuto ( wxFontEncoding  enc = wxFONTENCODING_DEFAULT)

Constructs a new wxConvAuto instance.

The object will try to detect the input of the multibyte text given to its wxMBConv::ToWChar() method automatically but if the automatic detection of Unicode encodings fails, the fall-back encoding enc will be used to interpret it as multibyte text.

The default value of enc, wxFONTENCODING_DEFAULT, means that the global default value (which can be set using SetFallbackEncoding()) should be used. As with that method, passing wxFONTENCODING_MAX inhibits using this encoding completely so the input multibyte text will always be interpreted as UTF-8 in the absence of BOM and the conversion will fail if the input doesn't form valid UTF-8 sequence.

Another special value is wxFONTENCODING_SYSTEM which means to use the encoding currently used on the user system, i.e. the encoding returned by wxLocale::GetSystemEncoding(). Any other encoding will be used as is, e.g. passing wxFONTENCODING_ISO8859_1 ensures that non-UTF-8 input will be treated as latin1.

Member Function Documentation

◆ DetectBOM()

static wxBOM wxConvAuto::DetectBOM ( const char *  src,
size_t  srcLen 
)
static

Return the BOM type of this buffer.

This is a helper function which is normally only used internally by wxConvAuto but provided for convenience of the code that wants to detect the encoding of a stream by checking it for BOM presence on its own.

Since
2.9.3

◆ DisableFallbackEncoding()

static void wxConvAuto::DisableFallbackEncoding ( )
static

Disable the use of the fall back encoding: if the input doesn't have a BOM and is not valid UTF-8, the conversion will fail.

◆ GetBOM()

wxBOM wxConvAuto::GetBOM ( ) const

Return the detected BOM type.

The BOM type is detected after sufficiently many initial bytes have passed through this conversion object so it will always return wxBOM_Unknown immediately after the object creation but may return a different value later.

Since
2.9.3

◆ GetBOMChars()

const char* wxConvAuto::GetBOMChars ( wxBOM  bom,
size_t *  count 
)

Return a pointer to the characters that makes up this BOM.

The returned character count is 2, 3 or 4, or undefined if the return value is nullptr.

Parameters
bomA valid BOM type, i.e. not wxBOM_Unknown or wxBOM_None.
countA non-null pointer receiving the number of characters in this BOM.
Returns
Pointer to characters composing the BOM or nullptr if BOM is unknown or invalid. Notice that the returned string is not NUL-terminated and may contain embedded NULs so count must be used to handle it correctly.
Since
2.9.3

◆ GetEncoding()

wxBOM wxConvAuto::GetEncoding ( ) const

Return the detected encoding.

Returns wxFONTENCODING_MAX if called before the first use.

Since
3.1.5

◆ GetFallbackEncoding()

static wxFontEncoding wxConvAuto::GetFallbackEncoding ( )
static

Returns the encoding used by default by wxConvAuto if no other encoding is explicitly specified in constructor.

By default, returns wxFONTENCODING_ISO8859_1 but can be changed using SetFallbackEncoding().

◆ IsUsingFallbackEncoding()

bool wxConvAuto::IsUsingFallbackEncoding ( ) const

Check if the fall-back encoding is used.

Since
3.1.5

◆ SetFallbackEncoding()

static void wxConvAuto::SetFallbackEncoding ( wxFontEncoding  enc)
static

Changes the encoding used by default by wxConvAuto if no other encoding is explicitly specified in constructor.

The default value, which can be retrieved using GetFallbackEncoding(), is wxFONTENCODING_ISO8859_1.

Special values of wxFONTENCODING_SYSTEM or wxFONTENCODING_MAX can be used for the enc parameter to use the encoding of the current user locale as fall back or not use any encoding for fall back at all, respectively (just as with the similar constructor parameter). However, wxFONTENCODING_DEFAULT can't be used here.