IBR Contributor//September 13, 2004//
The lawyers of SCO Group must have been embarrassed. When they filed their lawsuit against DaimlerChrysler for copyright infringement, they released an electronic copy of the documents created in Microsoft Word. Unfortunately, they forgot to strip out the metadata.
Metadata is literally “data about data.” It is Microsoft’s chosen term for a wide variety of information that is embedded in document files, particularly those created in Word, Excel and PowerPoint. Among other things, metadata can include the name of the organization creating the document, the names of everyone who worked on it, all revisions and corrections, hidden text, and comments.
It may seem that Microsoft is violating our trust by embedding metadata in our documents, but in fact, it is metadata that facilitates important functions like collaboration and version control. Working together in an office would be more difficult without it.
It’s also important to know that the use of metadata is not limited to Microsoft – it’s part of nearly any software that permits multiple authors, enables an escalating approval process, tracks changes in documents, allows you to enter hidden comments, etc. This includes WordPerfect and many other commonly used programs.
You just need to be aware that it’s there. SCO’s lawyers weren’t, or had forgotten. As a result, the metadata in the DaimlerChrysler lawsuit files revealed that SCO also intended to file a similar suit against Bank of America. Thus far, SCO has not carried through with its original intentions, probably because a judge has dismissed the suit against DaimlerChrysler.
In the last couple of years, embedded metadata in electronic documents has caused highly publicized diplomatic snafus and political embarrassments. It has also resulted in plenty of loss and distress that you never hear about. For example, we received a proposal from a potential vendor the other day as an Excel file. It is the same file they send to all of their prospective customers. I’m sure they don’t realize that the file contains embedded metadata that reveals their costs and markups.
Metadata is created in a variety of ways. As a result, there is no single method of seeing all of it. In Word, some of it is revealed simply by clicking View > Markup. Other bits of metadata may be accessed by exploring the options on the File menu. Still other metadata can only be uncovered by sophisticated forensic software.
Metadata can also be difficult to remove. Here are some suggestions that will help avoid the embarrassing or harmful results of distributing embedded metadata:
Have a policy. Employees should be made aware that metadata is part of many document files. There should be strict guidelines regarding the external distribution of electronic documents.
After all, distributing only printed documents pretty much removes metadata as an issue.
Remove the most obvious metadata. In Office 2003, you can remove much of the metadata by following this procedure: On the Tools menu, click Options. On the Security tab, check the boxes to a) Remove personal information from file properties on save, and b) Warn before … sending a file that contains tracked changes or comments. Then click OK.
Convert to a PDF. Using Adobe Acrobat to convert documents to a pdf (portable document format) file removes most of the metadata. However, it also makes it difficult to edit the file. For this reason, a pdf may not work if the recipient will need to make any changes to the document or interact with it, as they might with a spreadsheet, for example.
Use a commercial product. There are programs available that claim to scrub all metadata out of electronic documents. Unfortunately, the most effective of these products are costly and employees must be trained to use them.
For more information and relevant links about metadata, see the online version of this column at www.insllc.net.
Rick Edvalson is an MBA and systems engineer at IntegriNet Solutions Inc., a computer and networking services company serving Southwest Idaho. He may be reached at 376-0500 or via email at [email protected].