http://www.joelonsoftware.com/items/2008/02/19.html
If you started reading these documents with the hope of spending a weekend writing some spiffy code that imports Word documents into your blog system, or creates Excel-formatted spreadsheets with your personal finance data, the complexity and length of the spec probably cured you of that desire pretty darn quickly. A normal programmer would conclude that Office’s binary file formats:
* are deliberately obfuscated
* are the product of a demented Borg mind
* were created by insanely bad programmers
* and are impossible to read or create correctly.You’d be wrong on all four counts. With a little bit of digging, I’ll show you how those file formats got so unbelievably complicated, why it doesn’t reflect bad programming on Microsoft’s part, and what you can do to work around it.
Enjoy:
http://www.joelonsoftware.com/items/2008/02/19.html
The Excel file format specification is remarkably obscure about this. It just says that the 1904 record indicates “if the 1904 date system is used.” Ah. A classic piece of useless specification. If you were a developer working with the Excel file format, and you found this in the file format specification, you might be justified in concluding that Microsoft is hiding something. This piece of information does not give you enough information. You also need some outside knowledge, which I’ll fill you in on now. There are two kinds of Excel worksheets: those where the epoch for dates is 1/1/1900 (with a leap-year bug deliberately created for 1-2-3 compatibility that is too boring to describe here), and those where the epoch for dates is 1/1/1904.
We will see how the BRM next week will fix the leap year bug for Open XML. The binary specification of MS Office is of certain importance for Open XML as ECMA, the submitter of the standard, justified a second standard: it has an alleged "high-fidelity backwards compatibility with the binary formats". However, only a few days ago the current specification was made publicly available. The implications would be that in 200 years someone can still implement the binary format to get access to doc files. I wonder what we need then OOXML for? Isn't the scenario that all users will convert their binary files to OOXML a bit unrealistic? And still no mapping is provided by Microsoft. No one can verify if the ISO standard candidate OOXML is better "backwards compatible" than the existing ISO standard, ISO 26300:2006. Applications are available to convert the old binary files to both formats.
