Skip to content

How to Convert Text to Binary: ASCII, UTF-8 and Character Encoding Explained

Every piece of text you read on a screen is stored as binary data, sequences of 0s and 1s. Understanding how text is converted to binary is fundamental to computer science, networking, data storage and debugging encoding issues. This guide covers how character encoding works, the difference between ASCII and UTF-8, and how to convert text to binary step by step.

What is Character Encoding?

Character encoding is a system that maps characters (letters, numbers, symbols) to numeric values, which are then stored as binary. When you type the letter "A," your computer does not store the shape of the letter. Instead, it stores a number (65 in ASCII) which is represented in binary as 01000001.

Different encoding standards define different mappings. The two most important ones are ASCII and UTF-8.

ASCII: The Foundation

ASCII (American Standard Code for Information Interchange) was created in 1963 and uses 7 bits to represent 128 characters. This includes uppercase and lowercase English letters, digits 0-9, punctuation marks and control characters like newline and tab.

CharacterASCII CodeBinary (8-bit)
A6501000001
a9701100001
04800110000
Space3200100000
!3300100001

ASCII's limitation is that it only supports 128 characters. It cannot represent accented letters, Chinese characters, Arabic script, emoji or most of the world's writing systems.

UTF-8: The Universal Standard

UTF-8 (Unicode Transformation Format, 8-bit) is the dominant encoding on the web, used by over 98% of all websites. It is backwards-compatible with ASCII (the first 128 characters are identical) but can represent every character in Unicode, over 149,000 characters across all writing systems.

UTF-8 uses variable-length encoding. ASCII characters use 1 byte, European accented characters use 2 bytes, Asian characters use 3 bytes, and emoji use 4 bytes:

BytesRangeExamples
1 byteU+0000 to U+007FA, z, 5, @
2 bytesU+0080 to U+07FFe, n, ss
3 bytesU+0800 to U+FFFFChinese, Japanese, Korean characters
4 bytesU+10000 to U+10FFFFEmoji, historic scripts, musical symbols

Step-by-Step: Converting Text to Binary

Here is how to convert the word "Hi" to binary using ASCII encoding:

  1. Look up each character's ASCII code: H = 72, i = 105
  2. Convert each code to 8-bit binary: 72 = 01001000, 105 = 01101001
  3. Combine: 01001000 01101001

To convert a decimal number to binary, repeatedly divide by 2 and read the remainders from bottom to top. For example, 72 divided by 2 gives 36 remainder 0, then 18 r0, 9 r0, 4 r1, 2 r0, 1 r0, 0 r1. Reading bottom to top: 1001000. Pad to 8 bits: 01001000.

Code Examples

Most programming languages make text-to-binary conversion straightforward:

JavaScript

"Hello".split("").map(c =>
  c.charCodeAt(0).toString(2).padStart(8, "0")
).join(" ");
// "01001000 01100101 01101100 01101100 01101111"

Python

" ".join(format(ord(c), "08b") for c in "Hello")
# "01001000 01100101 01101100 01101100 01101111"

Practical Use Cases

  • Debugging encoding issues. When text appears as garbled characters (mojibake), examining the raw binary or hex values helps identify whether the wrong encoding was applied.
  • Network protocols. Many protocols work at the byte level. Understanding binary representation helps when reading packet captures or implementing protocol parsers.
  • Data storage. Knowing how many bytes a string occupies is important for database column sizing, buffer allocation and bandwidth estimation.
  • Education. Text-to-binary conversion is a fundamental concept in computer science courses and coding bootcamps.
  • Steganography and encoding. Binary representation is the foundation for encoding schemes like Base64, URL encoding and various data serialization formats.

Common Encoding Pitfalls

  • Assuming ASCII. If your application handles user input, always use UTF-8. Assuming ASCII will break for any non-English characters.
  • Byte count vs character count. In UTF-8, string.length in JavaScript returns the number of UTF-16 code units, not bytes or visible characters. An emoji like a flag can be 4 bytes but show as length 2 in JavaScript.
  • BOM (Byte Order Mark). Some editors prepend a BOM (EF BB BF) to UTF-8 files. This invisible prefix can cause parsing errors in shell scripts, JSON files and CSV imports.
  • Double encoding. Encoding a string that is already encoded produces garbage. This is common when passing data through multiple layers that each apply their own encoding.

Convert text to binary now

Instantly convert any text to binary, decimal and hexadecimal representation with our free online tool.

Open Text to Binary Converter