Nguyen Kim Son

Archive for November, 2010|Monthly archive page

Convert html to xhtml using python

In python on November 3, 2010 at 10:28 pm

Converting html to xhtml is a boring task. Recently, I’ve been working on a small project that requires  xhtml input, precisely an html code where all open tags must have the corresponding close one. That leads me to the need to write a small program which takes as input a local html file or an address (ex:http://google.com) and gives as output the corresponding xhtml file. You can download it at:

http://svn.assembla.com/svn/4a_project/trunk/html2xhtml.tar.gz

Extract the download file. For running the program, type

./html2xhtml.py url(local file or link) output_file

Note that, if url is not a file, it must have prefix like http:// or ftp://

The program is entirely coded in python, so is is portable. In windows, maybe you need to convoke the script manually by typing

python html2xhtml.py url(local file or link) output_file

You can also use the class html2xhtml as a small library in your program.

Of course, some program like the one at http://www.it.uc3m.es/jaf/html2xhtml/ is by far complete and offers a lot more of functionality.

Advertisements