How to read file property meta-data
After searching for this and talking to some knowledgeable people, I thought I would post how I managed to read the meta-data that users can assign to a file in Windows. The solution I describe requires the following:
At the end of this post are links to some rough example code in C#. I will try to explain the main points within this post. The main reason I am posting this is because most examples and references I found to Dsofile.dll talked about reading Microsoft Office documents, but Dsofile.dll works great for reading the basic vanilla property meta-data for all files on Windows.
Background Information
If you find a file in windows and right-click/context menu it and select Properties, there is a summary tab that displays the following for all files:
Please note that this is the most basic information that can be applied to all files. If you did this for a Microsoft Office document, you would see a lot more meta-data information you can read and/or set. My intention was to describe a solution that can be applied to all files, thus I chose the lowest common denominator properties.
Here is an example of the properties as I applied them:
In my solution, I wanted to extract the following information for each file:
- Title
- Author Name
- Author Email Address (optional)
- Keywords
- Description/Comments
- Revision Number
- MD5 Hash Digest for the file
- URI to the file (in my case I am storing the files on a web server for public consumption)
- One or more subjects (applied using a delimited list)
- One or more categories (applied using a delimited list)
I then want to store this information in a SQL database for later retrieval and display.
Code Processing Overview
In my example I have three routines to perform the following:
- ScanResources() - This routine starts at a configured physical directory and parses all files/sub-directories [Main start-up routine]
- ParseResources(DirectoryInfo) - This routine processes each file in the supplied directory and then recursively calls itself to process any sub-directories that might exist. This is where I use DsoFile.dll to extract the meta-data information.
- PersistResource(string authorName, string authorEmail, string title, string keyWords, string comments, string revisionNumber, string hashDigest, string url, string[] categories, string[] subjects) - This is just a simple routine that persists the extracted meta-data to a SQL database in a transactional manner.
The summary meta-data information is found using OleDocumentPropertiesClass.SummaryProperties.
Example code usuage/snippet:
//------------------------------------------------------------
// Local members
//------------------------------------------------------------
OleDocumentPropertiesClass oleDocument = new OleDocumentPropertiesClass();
FileInfo fileInfo = new FileInfo("C:\Test.txt");
string title = String.Empty;
string author = String.Empty;
string authorName = String.Empty;
string authorEmail = String.Empty;
string keyWords = String.Empty;
string comments = String.Empty;
string revisionNumber = String.Empty;
StringBuilder hashDigest = new StringBuilder();
string url = String.Empty;
string[] categories = new string[0];
string[] subjects = new string[0];
//------------------------------------------------------------
// Open file to parse meta-data for
//------------------------------------------------------------
oleDocument.Open(fileInfo.FullName, true, dsoFileOpenOptions.dsoOptionDefault);
//------------------------------------------------------------
// Extract file meta-data
//------------------------------------------------------------
author = oleDocument.SummaryProperties.Author;
if (author.Contains("(") && author.Contains(")"))
{
int startPos = author.IndexOf('(');
int endPos = author.IndexOf(')');
authorName = author.Substring(0, startPos - 1);
authorEmail = author.Substring(startPos + 1, endPos - 1 - startPos);
}
else
{
authorName = author;
}
comments = oleDocument.SummaryProperties.Comments;
keyWords = oleDocument.SummaryProperties.Keywords;
title = oleDocument.SummaryProperties.Title;
revisionNumber = oleDocument.SummaryProperties.RevisionNumber;
url = fileInfo.FullName.Replace("\\", "/").Replace(WebConfigurationManager.AppSettings["ResourcesPhysicalRoot"].Trim(), WebConfigurationManager.AppSettings["ResourcesWebRoot"].Trim());
//------------------------------------------------------------
// Extract categories
//------------------------------------------------------------
if (oleDocument.SummaryProperties.Category.Contains(";"))
{
categories = oleDocument.SummaryProperties.Category.Split(';');
}
else
{
categories = new string[1];
categories[0] = oleDocument.SummaryProperties.Category;
}
//------------------------------------------------------------
// Extract subjects
//------------------------------------------------------------
if (oleDocument.SummaryProperties.Subject.Contains(";"))
{
subjects = oleDocument.SummaryProperties.Subject.Split(';');
}
else
{
subjects = new string[1];
subjects[0] = oleDocument.SummaryProperties.Subject;
}
//------------------------------------------------------------
// Close OLE document
//------------------------------------------------------------
oleDocument.Close(false);
//------------------------------------------------------------
// Calculate hash digest
//------------------------------------------------------------
using (FileStream stream = new FileStream(fileInfo.FullName, FileMode.Open, FileAccess.Read))
{
MD5 md5 = MD5.Create();
stream.Position = 0;
byte[] bytes = md5.ComputeHash(stream);
hashDigest.Length = 0;
for (int i = 0; i < bytes.Length; i++)
{
hashDigest.Append(bytes[i].ToString("x2"));
}
}
Conclusion
To conclude, you can leverage the DsoFile.dll ActiveX component provided by Microsoft to read the basic file property summary meta-data, as well as a lot more. I suggest you experiment with it to see how it can fit your needs. The download from Microsoft contains an example application that will show you the types of information you can read/write.
Download the (very rough) code example here: DsoExample.zip (3.25 kb)
.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, "Courier New", courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }