Archive for January, 2010

Creating Microsoft Office files on Google App Engine

Posted in Google App Engine, Java, Microsoft Office with tags , , , , , , , , , on January 1, 2010 by stephenhuey

Happy New Year!

One month ago, I was a bit frustrated trying to figure out how to create zip files and Microsoft Office documents with Java on Google App Engine.  As you can see in the Will it Play? list, some of the tried-and-true libraries like Apache POI are not yet supported on there.  Our project has a requirement for our GAE application to generate pretty Microsoft Word and Excel reports that our users can download for further editing, so I briefly considered trying to use the Google Docs API and pass data from GAE to Google Docs in order to construct the reports there, but it seemed like that would be a huge number of API calls outside of GAE, every single one of which would cost money once we passed the free quota!

An alternative would be to try constructing the relatively new Microsoft Office 2007 format in the GAE sandbox because it’s essentially text-based whereas the old format was binary.  You may have started receiving MS Office attachments from people a couple of years ago that you couldn’t open in Word or Excel, and if you looked closely, you’d have noticed the Word extension was .docx instead of .doc and the Excel one was .xlsx (no, I also don’t get warm fuzzies from the fact that they merely added an “x” onto the file extension).   Rename these files so they end in .zip and you’ll find any zip utility will open them because they’re actually a zip file containing text files (XML, to be specific) and images.  Microsoft named the format Office Open XML which has been their new standard since MS Office 2007, and anyone can actually open them with older versions of Microsoft Office if you download and install the free Microsoft Office Compatibility Pack.  Granted, a zip file is binary, but if I could successfully generate one on GAE, then everything else I’d be working with would be supported on GAE (text files and binary image files that I didn’t have to manipulate).

So I was excited to hear about GaeVFS, a virtual file system for Google App Engine (GAE gives you no file system access, so a lot of the usual Java calls related to files are not supported).  After playing with it, I was gung-ho about trying to create the relatively new Microsoft Office format since it would make it easier for me to construct these zip files of XML files and images.  My GAE-generated zip files were recognized on Mac OS X first, and Winzip on Windows wasn’t difficult to please either, but the last holdout was Windows XP Compressed Folders (which needed to recognize it before any MS Office program would recognize it).  Finally, I got the free Microsoft Word Viewer to happily open a GAE-generated .docx file right around mid-December, and I used the same code to generate a valid .xlsx file as well.

Because I was aiming for a proof-of-concept to verify that I could create these Office documents on GAE, I manually unzipped a .docx file created in Microsoft Word and uploaded its files into GAE so I could write some Java to stick them all into a zip file.  I know, I know, you’re probably already groaning at the thought of having to do all that.  Building these zip and .docx files should really be a simple matter in Java, and in a traditional environment, there’d be no fuss about it at all.  But due to some limitations in GAE, this slightly more painful workaround is necessary, and it’d be a showstopper if we couldn’t do this on GAE, so I had to make sure it would work before doing any more development!

I snagged a photo from this MSDN page so you can see what to expect when you unzip one of these newfangled MS Office files:

.docx file structure

The contents of a .docx file

Love it or hate it, that’s what we’re dealing with folks!  Note that the media folder is where the images go, and the top-level _rels folder has a file in it that has no name before the extension, so it’s just called .rels (which means it won’t show up in the Finder in Mac OS X even though it’s really there).  Okay, now for my sample code…

GaeVFS ships with a servlet that handles file uploads and abstracts how it writes them to its virtual file system.  You can use a simple upload page like this to get your files into the GAE datastore:

 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
 <title>File Upload</title>

 <form action="/gaevfs/" enctype="multipart/form-data" method="post">
 Path on server:<br>
 <input type="text" name="path" size="30" value="/gaevfs">
 Block size in KB (leave blank for default):<br>
 <input type="text" name="blocksize" size="10">
 File to upload:<br>
 <input type="file" name="filename" size="40">
 <input type="submit" value="Send">

Make sure to use the correct paths when saving these files into the GaeVFS file system:

uploading file parts

Uploading the .docx file parts

That should work if your web.xml has mappings like this:

<?xml version="1.0" encoding="utf-8"?>
<web-app xmlns:xsi=""
xsi:schemaLocation="" version="2.5">

GaeVFS is built on top of Apache Commons VFS, and you use their FileObject instead of the standard File class.  There are some examples on the GaeVFS wiki, but I had to play around for a bit before I figured out some of the differences I needed to know.  Here’s a lil’ class I made for my servlet to use:

package com.stephenhuey.docx;

 * @author Stephen Huey


import org.apache.commons.vfs.FileObject;
import org.apache.commons.vfs.FileSystemException;
import org.apache.commons.vfs.FileSystemManager;
import org.apache.commons.vfs.FileType;

public class FileObjectHelper {

 public static FileObject createFolder(FileSystemManager fsManager, String absolutePath) throws FileSystemException {
 FileObject theFolder = fsManager.resolveFile( absolutePath );
 if ( theFolder.exists() == false) {
 return theFolder;

 public static FileObject createFile(FileSystemManager fsManager, String absolutePath) throws FileSystemException {
 FileObject theFile = fsManager.resolveFile( absolutePath );
 if ( theFile.exists() == false) {
 return theFile;

 public static void zipDir(FileObject docxZipFile, FileObject directoryToZip) throws IOException {
 OutputStream out = docxZipFile.getContent().getOutputStream();
 ZipOutputStream zout = new ZipOutputStream(out);
 addDir(directoryToZip, zout, "");
 zout.close(); // make sure you close the ZipOutputStream, not the OutputStream!

 public static void addDir(FileObject dirObj, ZipOutputStream zout, String basePathSoFar) throws IOException {
 FileObject[] files = dirObj.getChildren();
 byte[] tmpBuf = new byte[1024];

 for (int i = 0; i < files.length; i++) {
 FileObject currentFile = files[i];
 String currentFileBaseName = currentFile.getName().getBaseName();

 if (currentFile.getType().equals(FileType.FOLDER)) {
 addDir(currentFile, zout, basePathSoFar + currentFileBaseName + "/");

 } else { // else it's a file, not a directory
 BufferedInputStream bis = new BufferedInputStream(currentFile.getContent().getInputStream());
 zout.putNextEntry(new ZipEntry(basePathSoFar + currentFileBaseName));
 int len;
 while ((len = != -1) {
 zout.write(tmpBuf, 0, len);
 } // end if
 } // end for loop
 } // end addDir

And this is my servlet that makes sure the files are there and calls for the zip files to be created and saved with a .docx extension:

package com.stephenhuey.docx;

 * @author Stephen Huey

import java.util.ArrayList;
import java.util.List;

import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.apache.commons.vfs.FileObject;
import org.apache.commons.vfs.FileSystemManager;

import com.newatlanta.commons.vfs.provider.gae.GaeVFS;

public class DocxWriterServlet extends HttpServlet {

 public void doGet( HttpServletRequest req, HttpServletResponse res ) throws IOException {

 GaeVFS.setRootPath( getServletContext().getRealPath( "/" ) );
 FileSystemManager fsManager = GaeVFS.getManager();
 try {

 List<FileObject> files = new ArrayList<FileObject>();

 FileObject relsFolder = FileObjectHelper.createFolder(fsManager, "gae://gaevfs/docxFiles/_rels");
 FileObject docPropsFolder = FileObjectHelper.createFolder(fsManager, "gae://gaevfs/docxFiles/docProps");
 FileObject wordFolder = FileObjectHelper.createFolder(fsManager, "gae://gaevfs/docxFiles/word");

 // subfolders under the word folder
 FileObject relsSubfolder = FileObjectHelper.createFolder(fsManager, "gae://gaevfs/docxFiles/word/_rels");
 FileObject mediaSubfolder = FileObjectHelper.createFolder(fsManager, "gae://gaevfs/docxFiles/word/media");
 FileObject themeSubfolder = FileObjectHelper.createFolder(fsManager, "gae://gaevfs/docxFiles/word/theme");

 // the only file in the top-level directory
 files.add(FileObjectHelper.createFile(fsManager, "gae://gaevfs/docxFiles/[Content_Types].xml"));

 // files in the docProps folder
 files.add(FileObjectHelper.createFile(fsManager, "gae://gaevfs/docxFiles/docProps/app.xml"));
 files.add(FileObjectHelper.createFile(fsManager, "gae://gaevfs/docxFiles/docProps/core.xml"));

 // files in the word folder
 files.add(FileObjectHelper.createFile(fsManager, "gae://gaevfs/docxFiles/word/document.xml"));
 files.add(FileObjectHelper.createFile(fsManager, "gae://gaevfs/docxFiles/word/fontTable.xml"));
 files.add(FileObjectHelper.createFile(fsManager, "gae://gaevfs/docxFiles/word/settings.xml"));
 files.add(FileObjectHelper.createFile(fsManager, "gae://gaevfs/docxFiles/word/styles.xml"));
 files.add(FileObjectHelper.createFile(fsManager, "gae://gaevfs/docxFiles/word/webSettings.xml"));

 // files in the _rels subfolder that's within the word folder
 files.add(FileObjectHelper.createFile(fsManager, "gae://gaevfs/docxFiles/word/_rels/document.xml.rels"));

 // files in the theme subfolder that's within the word folder
 files.add(FileObjectHelper.createFile(fsManager, "gae://gaevfs/docxFiles/word/theme/theme1.xml"));

 // files in the media subfolder that's within the word folder
 files.add(FileObjectHelper.createFile(fsManager, "gae://gaevfs/docxFiles/word/media/image1.jpeg"));
 files.add(FileObjectHelper.createFile(fsManager, "gae://gaevfs/docxFiles/word/media/image2.png"));

 try {
 FileObject docxRootFolder = FileObjectHelper.createFolder(fsManager, "gae://gaevfs/docxFiles");
 FileObject docxZipFile = FileObjectHelper.createFile(fsManager, "gae://gaevfs/generatedZip/wordDocumentGAE3.docx");
 FileObjectHelper.zipDir(docxZipFile, docxRootFolder);
 } catch (Exception e) {
 } finally {
 GaeVFS.clearFilesCache(); // this is important!

 public void destroy() {
 GaeVFS.close(); // this is not mandatory, but nice to do


Now just go to

in your web browser, and once that runs, you can find your .docx file at


Here’s what it looks like:

browsing the gaevfs file system

Finding the generated .docx in the GaeVFS file system

If you can open .docx files in your version of Microsoft Office, or if you have the free Microsoft Word Viewer and installed the free Microsoft Office Compatibility Pack, then you can download that file right away and verify that it looks exactly the same as the one you used as a starting point.

Of course, if you’re really needing to create MS Office files on Google App Engine, you’ll most likely be dynamically generating the parts. I already knew I could create XML and handle images in GAE, so the unknown for me was getting to this point. A website Microsoft created called is unfortunately not very well-organized, and the Java examples aren’t all that helpful, but you may find something there you can use. I’ll probably just write my own helper classes to build exactly the kinds of documents I need.

A final word of caution for you:  I set this stuff aside for a few weeks after I got my solution working and started focusing on other parts of our application, and in the meantime I upgraded my Eclipse installation’s App Engine SDK from 1.2.8 to 1.3.0, so when I came back to it for getting an example ready for this blog entry, I found that my generated .docx file was no longer valid!  In other words, MS Word would no longer open the file for some reason.  That was pretty scary, but I speculated that perhaps the datastore had become corrupted even though all the individual files in there seemed fine.  I reverted to 1.2.8 and uploaded them again and everything worked, and I’ve found other folks online saying that the local datastore can easily become corrupted.  When I switched my SDK back to 1.3.0, it no longer worked again.  I had to upload the files with the GaeVFS servlet running on 1.3.0 to get my code to generate a valid .docx file on 1.3.0, and that makes sense since the underlying datastore implementation could’ve changed enough to cause a problem.

By the way, my generated files are not recognizable by MS Office if I run this code on Mac OS X.  It’s fine if I run the code from my local app on Windows and also from my production app on, but I suspect there may be an issue with how the virtual file system abstracts things on OS X or something like that which causes MS Office to reject the .docx and .xlsx files even though they’re recognized as zip files by Windows XP Compressed Folders.  Most important of course is the fact that the production version on generates valid files!

Anyway, I’m glad it works and so far I’m really enjoying playing with Google App Engine.  It sounds like there are plenty of improvements planned in the near future for their Java support, so I’m looking forward to more goodies from the GAE team.

Let me know if you have any questions or if I need to fix something in this post.  Here’s to a great start on 2010!