Introduction to the ODF Toolkit27 Feb 2014
Microsoft Office has been the dominating office suite and unfortunately it still is. For a long time not only the programs were closed but also the file format.
Nevertheless there are open alternatives available, most notable Libre Office/Apache OpenOffice.org. In 2005 the OASIS foundation standardized Open Document, an open alternative to the proprietary world of Microsoft. Open Document is heavily influenced by the OpenOffice.org file format but is supported by multiple office suites and viewers.
Open Document files are zip files that contain some XML documents. You can go ahead and unzip any documents you might have:
unzip -l aufwaende-12.ods
Length Date Time Name
--------- ---------- ----- ----
46 2012-12-31 15:16 mimetype
815 2012-12-31 15:16 meta.xml
8680 2012-12-31 15:16 settings.xml
171642 2012-12-31 15:16 content.xml
3796 2012-12-31 15:16 Thumbnails/thumbnail.png
0 2012-12-31 15:16 Configurations2/images/Bitmaps/
0 2012-12-31 15:16 Configurations2/popupmenu/
0 2012-12-31 15:16 Configurations2/toolpanel/
0 2012-12-31 15:16 Configurations2/statusbar/
0 2012-12-31 15:16 Configurations2/progressbar/
0 2012-12-31 15:16 Configurations2/toolbar/
0 2012-12-31 15:16 Configurations2/menubar/
0 2012-12-31 15:16 Configurations2/accelerator/current.xml
0 2012-12-31 15:16 Configurations2/floater/
22349 2012-12-31 15:16 styles.xml
993 2012-12-31 15:16 META-INF/manifest.xml
208321 16 files
mimetype file determines what kind of document it is (in this case
META-INF/manifest.xml lists the files in the archive. The most important file is
content.xml that contains the body of the document.
Server Side Processing
Though there are quite some viewers and editors for Open Document available when it comes to the server side the situation used to be different. For processing Microsoft Office files there is the Java library Apache POI, which provides a lot of functionality to read and manipulate Microsoft Office files. But if you wanted to process Open Document files nearly your only option was to install OpenOffice.org on the server and talk to it by means of its UNO API. Not exactly an easy thing to do.
Fortunately there is light at the end of the tunnel: the ODF Toolkit project, currently incubating at Apache, provides lightweight access to files in the Open Document format from Java. As the name implies it's a toolkit, consisting of multiple projects.
The heart of it is the schema generator that ingests the Open Document specification that is available as a RelaxNG schema. It provides a template based facility to generate files from the ODF specification. Currently it only generates Java classes but it can also be used to create different files (think of documentation or accessors for different programming languages).
The next layer of the toolkit is ODFDOM. It provides templates that generate classes for DOM access of elements and attributes of ODF documents. Additionally it provides facilities like packaging and document encryption.
For example, you can list the file paths of an ODF document using the ODFPackage class:
OdfPackage pkg = OdfPackage.loadPackage("aufwaende-12.ods");
filePaths = pkg.getFilePaths();
If you are familiar with the Open Document spec ODFDOM will be the only library you need. But if you are like most of us and don't know all the elements and attributes by heart there is another project for you: Simple API provides easy access to a lot of the features you might expect from a library like this: You can deal with higher level abstractions like paragraphs for text or rows and cells in the spreadsheet world or search for and replace text.
This code snippet creates a spreadsheet, adds some cells to it and saves it:
SpreadsheetDocument doc = SpreadsheetDocument.newSpreadsheetDocument();
Table sheet = doc.getSheetByIndex(0);
If you are interested in seeing more code using the ODF Toolkit you can have a look at the cookbook that contains a lot of useful code snippets for the Simple API. Additionally you should keep an eye on this blog for the second part of the series where we will look at an application that extracts data from spreadsheets.