Latex To Xhtml Api using tex4ht

ICE, LaTeX 4 Comments

For the past 2 weeks I have been struggling with latex to html conversion. I tried packages like latex2html in ubuntu (after I decided to give up on trying to install in window) and plastex (python packages for latex conversion running in window). Both packages produce a good html format, but the html result generated is not suitable for our ICE system. The generated html file can not be customized with ICE (css) templates as the css naming convention in both html files are badly named.

plasTex produce a single html file for a LaTeX document (which is nice), but all the formulas found in LaTeX document are simply being translated as a character, which gives a poor result if there are complex formulas. Another issue with plastex is that the cross-referencing is not handled properly. Since LaTeX document have lots of cross-referencing behavior like footnotes, biographies, images captions referencing or even chapters or sections referencing, plasTex will just ignore this the html output.

Unlike plasTex, latex2html handle all the cross-referencing properly. It create a link for each text that they found referencing to an anchor of that link. latex2html also convert all the formula to png files and those formulas are nicely aligned in the result html file. The drawback using latex2html is that it is being too “smart” to tear apart the LaTeX document and produce a lot of htmls file for a single LaTeX document. This is not the result ICE wants as the html result must look “a-like” like the pdf file produced by latex. What I was trying to do with this result is to populate all the html body from those html files become a single file.
Latex2Html gives nicer result than plastex as it converts all the Math formulas to png file (need to have dvipng package installed). But Latex2Html is too “smart” as it generate a lot of html pages with table of content as well as the navigation bar on the top of each pages to navigate through the document. In order to produce a nice html for ICE, i need to compile all the html file into one file and remove all the unnecessary navigation. Since all the links for the cross referencing created by latex2html, I need to fix the link. It’s done, but I still wonder is this hacking necessary for the outputted html file and how long it takes to produce the html file since it’s run in server?

Numbering Support

Both plasTex and latex2html do not conserve the numbering produced by LaTeX document. In LaTeX, numbering of sections, chapters, tables or images captions are handled nicely by LaTeX program when the user compile the LaTeX document. In html document generated by both plasTex and latex2html, all the numbering are gone.

Question answered!

text4ht
I found a nice debian package to produce a nice single html file and produce a nice image for all the formulas. The package called text4ht. The only hacking things I need to do is just fixing up the footnotes and their links as tex4ht generate extra html page for each footnotes. Nice and simple. Thanks to debian package, and the convertion api is done!!! A nice html file being produced together with the pdf file link, so the user is able to download the pdf file. The html file is simple and xhtml compliance and has populated css file. I can customized ICE template to this css file to produce ICE html output. This api is able to produce .zip file as well to includes all the images produced by tex4ht for the formulas.

Interestingly tex4ht also can convert LaTeX document to odt document. I haven’t tried it yet ;) but it’s another things need to be done.

Issue (although it’s small issue, but user may found it’s tedious to find the correspond dependencies files):

  • .sty for templates use in LaTeX document file need to included in the zip file together with the LaTeX document so tex4ht in the server will not complained about unfound template
  • all images and dependencies files included in LaTeX document need to be included in the zip file as well

Installing latex2html in window

ICE, LaTeX, Window 4 Comments

Getting latex2html work in window is a real pain. Try to search a lot of forums to solve the below installation problem but seems like no luck… :(

> Building “latex2html.bat” from “latex2html.pin”
> … building pstoimg
> build.pl (Revision 1.6)
> config\build.pl: Warning: Skipping build of pstoimg because of missing external programs.
> … building texexpand

The software dependencies for latex2html to run in window as follow have been installed based on a good instruction on how to install latex2html in window:

  • Perl for window installed in C:\Perl
  • Graphics-conversion utilities called Netpbm installed in C:\TEXUTILS\netpbm
  • Ghostscript to view Postcript files installed in C:\Aladdin\gs6.01\
  • MiKTeX installed in C:\Program Files\MiKTeX 2.7

But I still cant solve the above problem that I got during execution of config.bat even I try to make sure there is no space for my directory and cfgcache.pl file is not created after I run the configuration file. Installation in linux is not as complex as this…