Unicode Lookup

Your data never leaves your browser

Decode any text to Unicode code points, HTML entities, hex values, and UTF-8 bytes. Supports emoji and all scripts.

Enter text above to see a character-by-character breakdown.
Ctrl+KClear

Share this tool

Found it useful? Help a fellow developer discover it.

https://developertoolkit.dev/tools/unicode-lookup

Unicode, ASCII, and UTF-8 explained

Every character you type has a number assigned to it by the Unicode standard. That number is called a code point, written as U+XXXX in hexadecimal. Encoding schemes like UTF-8 then decide how to represent that number in bytes. UTF-8 encodes the 128 ASCII characters in a single byte and uses 2 to 4 bytes for everything else, making it both backward-compatible and efficient for English text. Developers encounter this layer of the stack when parsing strings from external systems, debugging encoding issues, working with international text, or embedding special characters in HTML and CSS. This tool has two modes. Text Breakdown pastes any string and shows every character as a row in a table with its code point, HTML entity, hex value, binary representation, and UTF-8 bytes. Code Point Lookup takes a code point like U+1F600 and shows its full details including the character it renders to and how many bytes it costs in UTF-8.

Frequently Asked Questions

What is the difference between Unicode and ASCII?

ASCII (American Standard Code for Information Interchange) defines 128 characters: the English alphabet, digits, punctuation, and control characters. It was designed in the 1960s for English-only computing. Unicode is a universal standard that covers over 140,000 characters from every writing system in use, including emoji. ASCII is a strict subset of Unicode: the first 128 Unicode code points match ASCII exactly.

What is a Unicode code point?

A code point is a number assigned to a character in the Unicode standard. It is written as U+ followed by a hexadecimal number. For example, the letter A is U+0041 and the grinning face emoji is U+1F600. Code points range from U+0000 to U+10FFFF.

What is the difference between UTF-8, UTF-16, and UTF-32?

These are encoding schemes that map Unicode code points to bytes. UTF-8 uses 1 to 4 bytes per character and is backward-compatible with ASCII. It is the dominant encoding on the web. UTF-16 uses 2 or 4 bytes and is used internally by JavaScript and Java. UTF-32 uses exactly 4 bytes per character, making it simple but memory-intensive.

What are HTML entities and when do I use them?

HTML entities are text representations of characters that might otherwise be interpreted as HTML. For example, < must be written as &lt; in HTML to display as a literal less-than sign rather than the start of a tag. Named entities like &amp; and &copy; are human-readable. Numeric entities like &#169; work for any Unicode character.

How does this handle emoji and multi-byte characters?

Emoji and characters outside the Basic Multilingual Plane (above U+FFFF) require surrogate pairs in JavaScript strings. This tool uses the spread operator ([...input]) to correctly split strings into code points rather than code units, so emoji like 🎉 (U+1F389) are treated as a single character rather than two.

Related Tools