How to Convert HTML to Markdown: A Developer's Guide
Markdown has become the standard format for developer documentation, README files, static site generators and knowledge bases. But a lot of existing content lives in HTML, whether it was created in a CMS, exported from a rich text editor, or scraped from a web page. Converting HTML to Markdown gives you clean, portable, version-control-friendly text that is easy to read and maintain.
Why Convert HTML to Markdown?
- Documentation. Markdown is the standard for docs-as-code workflows. Converting HTML pages into Markdown lets you store documentation alongside your source code in Git.
- CMS migration. When moving from WordPress, Drupal or another CMS to a static site generator like Hugo, Astro or Next.js, you need to convert existing HTML posts into Markdown files.
- Readability. Markdown is far easier to read in its raw form than HTML. Stripping away tags makes content more accessible for writers and reviewers.
- Version control. Markdown diffs are clean and meaningful. HTML diffs are cluttered with tags, making code reviews harder.
- Portability. Markdown files work everywhere. They render on GitHub, GitLab, Notion, Obsidian, VS Code and hundreds of other tools without any extra processing.
How HTML Maps to Markdown
Most common HTML elements have a direct Markdown equivalent. Here is how the mapping works for the most frequently used elements:
Headings
<h1> to <h6> become # Heading 1 through ###### Heading 6
Inline Formatting
<strong> or <b> becomes **bold**
<em> or <i> becomes *italic*
<code> becomes `inline code`
<del> or <s> becomes ~~strikethrough~~
Links and Images
<a href="url">text</a> becomes [text](url)
<img src="url" alt="text"> becomes 
Lists
<ul><li> becomes - list item
<ol><li> becomes 1. list item
Nested lists use indentation (2 or 4 spaces per level)
Code Blocks
A <pre><code> block converts to a fenced code block with triple backticks. If the code element has a class like language-javascript, the language identifier is placed after the opening fence:
```javascript
const greeting = "hello";
```
Tables
HTML tables convert to pipe-delimited Markdown tables. The <thead> row becomes the header, followed by a separator row of dashes:
| Name | Type |
| ----- | ------ |
| id | number |
| email | string |
Blockquotes
<blockquote> becomes > quoted text
Nested blockquotes use additional > characters
Horizontal Rules
<hr> becomes ---
Common Edge Cases
Not all HTML converts cleanly. Here are the most common edge cases you will encounter:
- Nested lists. Deeply nested lists require precise indentation in Markdown. A converter needs to track the nesting depth and indent each level correctly.
- Inline styles. HTML like
<span style="color:red">has no Markdown equivalent. Most converters strip the styles and keep only the text content. - Custom attributes. Classes, IDs and data attributes are lost during conversion since Markdown does not support them natively.
- Empty or whitespace-only elements. Tags like
<p> </p>should be collapsed into blank lines rather than producing visible output. - Divs and spans. Generic container elements have no Markdown equivalent. Converters typically extract the inner text content and discard the wrapper tags.
- Line breaks. A
<br>tag can be converted to either two trailing spaces or a backslash at the end of a line, depending on the Markdown flavor.
Elements Without Markdown Equivalents
Some HTML elements simply do not have a Markdown representation. When converting, you need a strategy for handling these:
- Forms. Input fields, select dropdowns and buttons cannot be expressed in Markdown. These are typically removed or replaced with a placeholder note.
- Videos and iframes. Embedded media has no native Markdown syntax. You can keep the raw HTML inline (most Markdown renderers support this) or convert to a link.
- Custom web components. Elements like
<my-component>are stripped entirely unless you configure the converter to preserve them as raw HTML. - Definition lists. While
<dl>,<dt>,<dd>exist in HTML, standard Markdown does not support them. Some extended flavors like PHP Markdown Extra do. - Details/summary. Collapsible sections using
<details>can be preserved as raw HTML in GitHub-Flavored Markdown, but are not part of the core Markdown spec.
Real-World Use Cases
- Blog migration. Moving from WordPress or Ghost to a Markdown-based static site generator. Export your posts as HTML, then batch convert them to
.mdfiles with front matter. - Rich text editor cleanup. WYSIWYG editors like TinyMCE and CKEditor produce verbose HTML. Converting to Markdown strips unnecessary tags and gives you clean, maintainable content.
- Creating docs from web pages. You can scrape API documentation or reference pages and convert the HTML to Markdown for offline access or inclusion in your own docs.
- Email template extraction. Pull the text content from HTML email templates into Markdown for archiving or repurposing as blog posts.
- Knowledge base standardization. When consolidating documentation from multiple sources (Confluence, Notion, Google Docs), converting everything to Markdown gives you a single, consistent format.
Best Practices for Clean Conversion
- Clean the HTML first. Remove unnecessary wrappers, empty tags, inline styles and tracking pixels before converting. The cleaner your input, the better your output.
- Choose your Markdown flavor. Decide between standard Markdown, CommonMark or GitHub-Flavored Markdown (GFM). GFM supports tables, task lists and strikethrough, which means fewer elements will be lost.
- Preserve semantic structure. Make sure headings maintain their hierarchy. An
<h2>should become##, not#. Do not flatten the heading hierarchy. - Handle images carefully. Check that image URLs are absolute (not relative) after conversion, and verify that the alt text is preserved.
- Keep raw HTML for complex elements. If you need to preserve videos, iframes or interactive elements, most Markdown renderers allow inline HTML. Use this as a fallback rather than losing content.
- Review the output. Automated conversion is never perfect. Always review the generated Markdown to fix spacing issues, broken links and formatting inconsistencies.
- Normalize whitespace. Remove excessive blank lines and ensure consistent spacing between sections. Most converters produce extra newlines that need trimming.
Conversion Libraries and Tools
If you are building conversion into your workflow, several popular libraries can help. In JavaScript, turndown is the most widely used HTML-to-Markdown library. In Python, markdownify and html2text are popular choices. For quick, one-off conversions without installing anything, an online converter is the fastest option.
Convert HTML to Markdown instantly
Paste your HTML and get clean Markdown output. Runs entirely in your browser with no data sent to any server.
Open HTML to Markdown Converter