Unicode Escape (\u %u \x &#x U+ 0x \N) Encoder / Decoder Online

Decoded

Unicode Escape

Encoded

Unicode Escape	Format A-F

Other String converters are here

About Unicode escape sequence

Unicode escape sequence convert a single character to the format of a 4-digit hexadecimal code point, such as \uXXXX. For example, "A" becomes "\u0041".

In DenCode, in addition to the \uXXXX format, the following notation formats are also supported.

Format	Conversion result of "ABC"	Description / Programming language
\uXXXX	\u0041\u0042\u0043	Common Unicode escape sequences
\u{X}	\u{41}\u{42}\u{43}	Lua
\x{X}	\x{41}\x{42}\x{43}	Perl
\X	\41\42\43	CSS
&#xX;	ABC	HTML, XML
%uXXXX	%u0041%u0042%u0043	Percent-encoding (Non-standard)
U+XXXX	U+0041 U+0042 U+0043	Unicode standard notation for code points (Space separated)
0xX	0x41 0x42 0x43	Hexadecimal notation of code points (Space separated)

Some of the above formats are mentioned in RFC 5137 (ASCII Escaping of Unicode Characters) as BEST CURRENT PRACTICE, but there is no international standard.

The %uXXXX format is supported by Microsoft IIS, but is a non-standard format. It can be encoded in %u format with System.Web.HttpUtility.UrlEncodeUnicode in C#, but this method has been obsoleted since .NET Framework 4.5.

Please note that in the \X format, a trailing single space is treated as a delimiter and ignored when decoding, as specified by CSS. In the U+XXXX and 0xX formats, each character is separated by a single space when encoded, and a trailing single space is ignored when decoded, as in the \X format.

Escaping by Unicode Name

As Unicode escape sequences, escaping by Unicode name is also supported.

Format	Conversion result of "A"	Description / Programming language
\N{name}	\N{LATIN CAPITAL LETTER A}	C++23, Python, Perl

Unicode names can be found at Names List Charts - Unicode or NamesList.txt - Unicode.

Unicode non-BMP characters in Unicode escape sequence

Unicode non-BMP characters do not fit in the 4-digit code point, so they are represented in the following notation formats for each programming language.

The result of converting "😀" (U+1F600), which is a Unicode non-BMP character, is as follows.

Format	Conversion result of "😀" (U+1F600)	Programming language
\uXXXX	\uD83D\uDE00	Java, Kotlin, Scala
\u{X}	\u{1F600}	C++23, Rust, Swift, JavaScript, PHP, Ruby, Dart, Lua
\U00XXXXXX	\U0001F600	C, C++, Objective-C, C#, Go, Python, R
\x{X}	\x{1F600}	Perl
\X	\1F600	CSS
&#xX;	😀	HTML, XML
%uXXXX	%uD83D%uDE00	-
U+XXXX	U+1F600	-
0xX	0x1F600	-
\N{name}	\N{GRINNING FACE}	C++23, Python, Perl

In the \uXXXX and %uXXXX formats, non-BMP characters are represented by two code units as UTF-16 surrogate pairs. In other formats, a character is represented by a single code point.