Extracting RTF Documents from XMI Files with PowerShell

Enterprise Architect allows you to create custom document templates to be used in your reports. Internally EA uses the oldie-goldie RTF document format for these templates. When exporting these templates via Export Reference Data they are included as base64 encoded zipped RTF documents which in turn are again base64 encoded. For the untrained eye it therefore can become tedious to instantly spot the actual contents of these XMI documents. However with a couple of lines of PowerShell we can easily extract those templates to native RTF format. And here is how:

Export RTF as XMI reference data

Extract the desired document templates from EA as shown in the following screenshot:

Export Reference Data

Export Reference Data

Contents of the generated XMI file

Each RFT document is stored in an XMI file resides in a DataSet set of DataRow elements which in turn hold Columnss with the actual data as depicted in the following figure:

XMI Base64 Encoded ZIP

XMI Base64 Encoded ZIP

To extract the data we only need to load the xmi and decode it:

PARAM
(
  [Parameter(Mandatory = $true, Position = 0)]
  [ValidateNotNullOrEmpty()]
  [System.IO.FileInfo] $InputObject
  ,
  [Parameter(Mandatory = $false)]
  [System.Text.Encoding] $Encoding = [System.Text.Encoding]::Default
  ,
  [Parameter(Mandatory = $true)]
  [ValidateNotNullOrEmpty()]
  [string] $Path
  ,
  [Parameter(Mandatory = $false)]
  [int] $Index = 0
  ,
  [Parameter(Mandatory = $false)]
  [string] $ColumnName = 'BinContent'
)

[xml] $xml = Get-Content -Raw $InputObject
$base64 = ($xml.RefData.DataSet.DataRow[$Index].Column |? name -eq $ColumnName).'#text';
$bytes = [System.Convert]::FromBase64String($base64);
$data = [System.Text.Encoding]::Default.GetString($bytes);
Set-Content -Path $Path -Value $data;

SSDocument

The output is a ZIP file which holds a single file called str.dat. This file really is an XML fragment in SSDocument format (as shown in the following figure). An SSDocument contains an element SSDocument.Document which in turn holds the base64 encoded data:

XMI SSDocument Extract

XMI SSDocument Extract

Decoding is done in the same way as before …

PARAM
(
  [Parameter(Mandatory = $true, Position = 0)]
  [ValidateNotNullOrEmpty()]
  [System.IO.FileInfo] $InputObject
  ,
  [Parameter(Mandatory = $false)]
  [System.Text.Encoding] $Encoding = [System.Text.Encoding]::Default
  ,
  [Parameter(Mandatory = $true)]
  [ValidateNotNullOrEmpty()]
  [string] $Path
)

[xml] $xml = Get-Content -Raw $InputObject -Encoding Unicode;
$base64 = $xml.SSDocument.'SSDocument.Document'.'#text';
$bytes = [System.Convert]::FromBase64String($base64);
$data = $Encoding.GetString($bytes);
Set-Content -Path $Path -Value $data;

… and that’s all it takes it. The result is a regular RTF as you can see in the following figure:

Extracted RTF

Extracted RTF

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: