Working in UNO automation of OpenOffice

ICE 1 Comment

Currently, I am working as casual programmer in university. One main task that I have been dealing since the first day I started my contract till now is implementing OO UNO automation in our system called ICE. ICE is developed under python 2.4 and uses pyuno bridge to develop UNO component in python so we can automate open office writer document.

ICE is mainly used by Electronic Printing Department to generate course study book and introductory books. Modules for each books are created separately in different document writer application like NeoOffice, MS words or OpenOffice writer. After each modules created, user uses ICE to generate the complete study book and introductory book and this is where uno automation performs it’s task. These tasks include build the book (with template selected by user e.g. study book template), convert all non OpenOffice writer documents to OpenOffice documents, inserting the converted documents to the book, update all bookmarks, generate table of content for the book and render the completed book to PDF and HTML format.

Issue that I faced (will be added based on what I faced when I work):

  • ICE is developed under Python 2.4 and so does all the module that support ICE, but stable version of pyuno provided by OpenOffice.org still uses Python 2.3. Python 2.3 is installed together with OpenOffice can located in the installation directory (In linux: /opt/openoffice.orgx.x/program/python and in window: C:\Program Files\OpenOffice.org x.x\program\python.bat). ICE uses Python Twisted that run locally in user’s machine (now in the process of migrating to server version), so when running pyuno for the automation, python 2.3 is executed from ICE through command line. When executing pyuno, all the information of the book in data stream format (not file format) are passed to command line and another process of python will handle the automation separately. The problem exists when book document is too big like more than 600 pages study book which have a big data stream. Command lines cant handle more than 5k data. One of the solution that I have done is save the bit stream data into temporary file and pass the file name to the automation module. Although now automation can handle building big book, that solution is not a preferred solution. Now I am working on sending the the big data stream to output stream instead of saving it to the file.
  • MathType Object issue. Automation .uno:UpdateAll can not handle MathType object properly when the process of inserting the document into the book are automatically performed non-visibly by user. .uno:UpdateAll will either cause Open Office to shut down by stating “too many windows opened” or just sit there without doing anything. So, before the document (in odt) being inserted into the book file, I hacked into the the open office content.xml of the document and remove all the MathType objects and then insert the document to the book file. After all the documents being inserted into the book file, I hacked the book content.xml file again and put back all the MathType objects. Surprisingly, after hacking the MathType object, .uno:UpdateAll never complain at all even the insertion of the document are done non-visibly. .uno:UpdateAll is used to build table of content of the book. Instead of throwing error and causing Open Office crashed, .uno:UpdateAll never build a correct table of content without being indexed twice (at least) automatically. This problem is not consistent as sometimes even with a book without MathType objects being build, the indexing for the table of content still wrong. For temporary solution (again), user need to open the book after being built and re-index the table of content by themselves then the table of content will be correct since the re-indexing is done with the book visibly opened by user.

Exporting to pdf from Open office Writer

MS Office, Open Office, xml No Comments

One of main reason for me to choose Open Office over Microsoft Words is that I can convert my document to PDF without installing (or buying) other software. As for my work at Uni, Open Office is main tools used by our clients to create their documents and I have been hacking around with Open Office xml files to meet the customer expectation.

When I play around with Open Office fonts today, I found out one bug when I try to export one of my client’s document to PDF. The document is successfully exported to PDF, but some of the characters are missing in the PDF file. I have been sitting in front of my computer try to figure out why the characters are gone where as when I created a new document with those characters, they appear in the PDF file. Sitting the whole two hours trying to compare the style.xml (I crashed vim and crimson editor for at least 20 times because my box has only 512 RAM and style.xml consists too many lines) and content.xml files, and finally I managed to work out why the characters are not included in the PDF file.

You can replicate the problem by doing these step: :D

  1. Create writer document
  2. Put some text in with an “en dash” (–) character. You can insert this character from open office using <Alt> + <0150> on your number keypad or <Ctrl> + <-> (May not work in some dialog boxes).
  3. Select all the text (including “en dash” character) and change the font to “Helvetica” (only).
  4. Save the document and start to export the document to PDF
  5. Open the PDF file and your “en dash characters are gone”

The error is found for document created in Words opened and exported in Open Office with the above font as well. I don’t know if this is an open office bug as “Helvetica” (standard “Arial” font in Neo Office) is not belong to font list (there is only “Helvetica-Narrow”) because after I replace the style to Helvetica-Narrow the “en dash” appear in the PDF.

Modification that I made in style.xml file is changing:

<style:text-properties style:font-name=”Helvetica” fo:font-size=”8.5pt” fo:font-style=”normal”/>

to:

<style:text-properties style:font-name=”Helvetica-Narrow” fo:font-size=”8.5pt” fo:font-style=”normal”/>

I try to search of the issue report in open office, and seems like this issue had been reported previously related Helvetica included in PDF export in Issue 81970. I have reported the issue to Open Office (Issue 82540) and still waiting for the response ;) . My client insists on using Helvetica (only) font even Arial has the same view because changing the font name may break other system that use the same document.

Few hours later….

As I predicted, Open office closed Issue 82540 and suggested to use other font that is supported by Open office like “Helvetica-Narrow”. Well, I think that will be the solution for my case for a time being. :(