Skip to content

HTML Entities: A Complete Guide to Encoding Special Characters

HTML entities are special sequences of characters that represent symbols which cannot be typed directly into HTML markup. Characters like <, >, and & have special meaning in HTML syntax, so you need a way to display them as literal text. That is where HTML entities come in. They let you safely include reserved characters, invisible characters like non-breaking spaces, and symbols like copyright signs or arrows in your web pages.

Why HTML Entities Exist

HTML uses certain characters as part of its syntax. The less-than sign (<) opens a tag, the ampersand (&) begins an entity reference, and the greater-than sign (>) closes a tag. If you write these characters directly in your HTML content, the browser will interpret them as markup instead of displaying them as text. HTML entities solve this problem by providing escape sequences that browsers render as the intended character.

Beyond reserved characters, entities also provide access to hundreds of special symbols, mathematical operators, currency signs, arrows and typographic characters that may not be available on a standard keyboard.

Named vs Numeric Entities

HTML entities come in two forms: named entities and numeric entities. Both produce the same result, but they use different syntax.

  • Named entities use a human-readable keyword between an ampersand and a semicolon. For example, &amp; renders as &. Named entities are easier to read and remember.
  • Numeric entities use the Unicode code point of the character. They can be written in decimal (&#38;) or hexadecimal (&#x26;) format. Numeric entities work for any Unicode character, even those without a named entity.

Named entities are preferred for common characters because they are more readable. Numeric entities are useful when you need to display a character that has no named entity, or when you want maximum compatibility with older parsers.

Most Common HTML Entities

Here is a reference table of the most frequently used HTML entities that every developer should know.

CharacterNamed EntityNumeric EntityDescription
<&lt;&#60;Less than
>&gt;&#62;Greater than
&&amp;&#38;Ampersand
"&quot;&#34;Double quote
'&apos;&#39;Single quote / apostrophe
 &nbsp;&#160;Non-breaking space
©&copy;&#169;Copyright
®&reg;&#174;Registered trademark
&trade;&#8482;Trademark
&euro;&#8364;Euro sign
£&pound;&#163;Pound sign
&hellip;&#8230;Horizontal ellipsis

When to Encode HTML

Understanding when HTML encoding is necessary is just as important as knowing how to do it. Here are the most common scenarios where encoding is required.

  • Displaying user input. Any content submitted by users (form fields, comments, profile data) must be encoded before rendering in HTML. This prevents malicious scripts from executing in the browser.
  • Showing code examples. When you display HTML, XML, or any markup language as text on a page, the angle brackets and ampersands must be encoded so the browser does not interpret them as actual tags.
  • Attribute values. If you insert dynamic values into HTML attributes, quotes and angle brackets must be encoded to prevent the attribute from being closed prematurely or injected with additional attributes.
  • Email templates. HTML emails need entities for special characters because email clients handle encoding inconsistently across platforms.
  • RSS and XML feeds. Content in XML-based formats must encode reserved XML characters to produce valid, parseable documents.

HTML Encoding in JavaScript

In JavaScript, how you insert content into the DOM determines whether encoding happens automatically or whether you need to handle it manually.

Safe: textContent

Using textContent is the safest approach because the browser automatically escapes any HTML characters. The content is treated as plain text, not markup.

// Safe. HTML is automatically escaped

element.textContent = userInput;

Dangerous: innerHTML

Using innerHTML parses the string as HTML, which means any script tags or event handlers in the content will execute. Never use innerHTML with untrusted input unless you sanitize it first.

// Dangerous. Script will execute

element.innerHTML = userInput;

// If userInput contains:

// <img src=x onerror="alert(1)">

Manual Encoding Function

If you need to encode HTML entities manually in JavaScript, you can use a simple replace function or leverage the DOM itself.

// Using string replacement

function encodeHTML(str) {

return str

.replace(/&/g, "&amp;")

.replace(/</g, "&lt;")

.replace(/>/g, "&gt;")

.replace(/"/g, "&quot;")

.replace(/'/g, "&#39;");

}

// Using the DOM

function encodeHTML(str) {

const div = document.createElement("div");

div.textContent = str;

return div.innerHTML;

}

HTML Encoding in Server-Side Languages

Most server-side languages and frameworks provide built-in functions for HTML encoding. Here are examples in popular languages.

PHP → htmlspecialchars($string, ENT_QUOTES, 'UTF-8')

Python → html.escape(string)

Ruby → CGI.escapeHTML(string)

Java → StringEscapeUtils.escapeHtml4(string)

C# / .NET → HttpUtility.HtmlEncode(string)

Go → html.EscapeString(string)

Modern template engines like Jinja2, Blade, EJS and Handlebars automatically encode output by default. Double-curly-brace syntax ({{ variable }}) typically auto-escapes, while triple-curly-brace or special filters output raw HTML. Always use the auto-escaping syntax unless you explicitly need raw output.

Security: Preventing XSS with HTML Encoding

Cross-Site Scripting (XSS) is one of the most common web security vulnerabilities. It occurs when an attacker injects malicious scripts into a web page that other users view. HTML encoding is one of the primary defenses against XSS attacks.

  • Always encode user input. Any data that originates from users, URL parameters, form fields, cookies, or external APIs should be HTML-encoded before being rendered in the page.
  • Encode on output, not input. The best practice is to store data in its raw form and encode it at the point of output. This gives you the flexibility to encode differently depending on the context (HTML body, attribute, JavaScript, URL).
  • Context matters. HTML encoding alone is not sufficient for all contexts. Content placed inside JavaScript blocks needs JavaScript escaping, URLs need URL encoding, and CSS values need CSS escaping. Use the right encoding for each context.
  • Use Content Security Policy. HTML encoding is a critical layer of defense, but it works best alongside other security measures like Content Security Policy (CSP) headers, which restrict what scripts can execute on your page.

Common Mistakes with HTML Entities

Even experienced developers make mistakes when working with HTML entities. Here are the most common pitfalls to watch out for.

  • Missing the semicolon. Writing &amp instead of &amp;. While some browsers are forgiving about this, omitting the semicolon can lead to ambiguous parsing and unexpected results.
  • Double encoding. Encoding content that has already been encoded results in visible entity codes on the page. For example, &amp;amp; will display as &amp; instead of &. Always check whether your content is already encoded before applying encoding again.
  • Using &nbsp; for spacing. Non-breaking spaces are meant to prevent line breaks between words, not for visual spacing. Use CSS margins and padding for layout spacing instead.
  • Forgetting to encode ampersands in URLs. URLs in HTML attributes that contain query parameters need their ampersands encoded. Writing ?a=1&b=2 in an href attribute should be ?a=1&amp;b=2 to produce valid HTML.
  • Encoding content inside script tags. HTML entities inside <script> blocks are not decoded by the JavaScript engine. Use JavaScript string escaping instead of HTML entities in script contexts.

UTF-8 vs HTML Entities

With UTF-8 encoding now standard across the web, you do not need to use HTML entities for most special characters. Characters like ©, , and accented letters can be typed directly in your HTML files as long as your document is saved as UTF-8 and includes the proper charset declaration. However, you should always use entities for the five reserved HTML characters: <, >, &, ", and '. These must be encoded to prevent the browser from misinterpreting them as markup.

Encode and decode HTML entities instantly

Paste your text and convert special characters to HTML entities or decode entities back to readable text. Runs entirely in your browser with no data sent to any server.

Open HTML Encoder / Decoder