In early June we told you about public availability of libmspub — a library for reading Microsoft Publisher files and converting them to OpenDocument and SVG. It's time for an update.
Work on this library, including reverse engineering of PUB files, is a Google Summer of Code 2012 project. Brennan Vincent is the primary developer and GSoC student who is mentored by Fridrich Štrba (LibreOffice) and Valek Filippov (re-lab).
There will be certain issues with the built-in SVG converter that are easy to predict. First of all, SVG doesn't yet have pagination, and according to Tavmjong Bah, an Inkscape's representative in the W3C SVG working group, it's a low priority feature at this point.
SVG also doesn't have a notion of linked text frames, albeit this could be solved thanks to recent Adobe's work on CSS. And then there is the whole sad story of flowed text in SVG. The example below is a good illustration of that, because contrary to that LibreOffice renders the text in frames just fine.
It is important to note, however, that libmspub will just make sure that as many features of Publisher files as possible will be understood, so that anyone could later plug in the code for converting those features to SVG. The library will also provide API for requesting single pages. As for LibreOffice Draw, it simply imports all pages.
This project is being worked on by Brennan Vincent, a Google Summer of Code student who is co-mentored by Fridrich Strba of LibreOffice team and Valek Filippov of truly yours re-lab team. Fridrich also keeps working on both Corel DRAW and Visio support in LibreOffice.
The libmspub library is the 3rd collaborative project between LibreOffice and re-lab. Architecturally it's a lot like both of the other libraries and has pretty much the same prerequisites: libwpd, libwpg, writerperfect. All source code is in a public Git repository.
The story of the libmspub project dates back to late 2010 when the Scribus team expressed an interest in at least a basic reverse-engineered specification of Microsoft Publisher files. The re-lab project did that, but the Scribus team turned out to be undermanned to have a go at a converter.
Hence the work on reverse-engineering .pub was temporarily put on hold. However OLE Toy app which was specifically created for examining .pub files eventually started supporting all kinds of proprietary file formats such as Visio, Corel DRAW, Macromedia Freehand etc.
Today OLE Toy is the central part of reverse-engineering workflow in both teams, and with this GSoC project it's destined to fulfill its original role. Better late than never.
If you liked this article, subscribe to the feed by clicking the image below to keep informed about new contents of the blog: