Convert Word documents to Clean HTML

Word2cleanhtml cleans up HTML pasted from Word documents. It applies filters to fix various things that Microsoft Office puts in its HTML and gives you a well formatted result that you can paste directly into a web page or content editing system.

Is it private?

The conversion process is completely automated. I don't get to see your document and no copy will be kept of your document.

The only exception to this is if you file a bug report and choose to include a copy of your document – then your document will be emailed to me along with the bug report.

Help! Where have my fonts/colours/effects gone?

Most formatting information is stripped out, leaving just the content and the structure (headings, paragraphs, lists etc). This is intentional. The way Word adds font and colour information is generally not appropriate for web or ebook publishing. If you are going to use the HTML in a website you should create separate stylesheets that add the styles you want.

Is there a desktop/offline version?

I don't have the resources to produce and support a desktop version at the moment.

How does it work?

It uses the Python programming language to manipulate the HTML produced by Microsoft Word. The lxml library does most of the work.

Who can I contact about the site?

My name is Olly Cope. You can email me if you have any questions or comments about this site.

© 2007-2012 Olly Cope
Designer and illustrator: audesign
Made in Liège, Belgium.