Iterating NTFS Streams

Window Form 2006. 1. 17. 18:24
.NET Matters
Iterating NTFS Streams

RSS

Download the code for this article: NETMatters0601.exe (131KB)


Q I read in your December 2005 column how to enumerate files using interop to access the file management functions in Kernel32.dll (see .NET Matters: BigInteger, GetFiles, and More). I don't need to enumerate files, but rather I need to enumerate the NTFS Alternate Data Streams within a particular file. Do you know any way I can accomplish this?


A First, nothing in the Microsoft .NET Framework provides this functionality. If you want it, plain and simple you'll need to do some sort of interop, either directly or using a third-party library.

If you're using Windows Server 2003 or later, Kernel32.dll exposes counterparts to FindFirstFile and FindNextFile that provide the exact functionality you're looking for. FindFirstStreamW and FindNextStreamW allow you to find and enumerate all of the Alternate Data Streams within a particular file, retrieving information about each, including its name and its length. The code for using these functions from managed code is very similar to that which I showed in my December column, and is shown in Figure1.

You simply call FindFirstStreamW, passing to it the full path to the target file. The second parameter to FindFirstStreamW dictates the level of detail you want in the returned data; currently, there is only one level (FindStreamInfoStandard), which has a numerical value of 0. The third parameter to the function is a pointer to a WIN32_FIND_STREAM_DATA structure (technically, what the third parameter points to is dictated by the value of the second parameter detailing the information level, but as there's currently only one level, for all intents and purposes this is a WIN32_FIND_STREAM_DATA). I've declared that structure's managed counterpart as a class, and in the interop signature I've marked it to be marshaled as a pointer to a struct. The last parameter is reserved for future use and should be 0.

If a valid handle is returned from FindFirstStreamW, the WIN32_FIND_STREAM_DATA instance contains information about the stream found, and its cStreamName value can be yielded back to the caller as the first stream name available. FindNextStreamW accepts the handle returned from FindFirstStreamW and fills the supplied WIN32_FIND_STREAM_DATA with information about the next stream available, if it exists. FindNextStreamW returns true if another stream is available, or false if not.

As a result, I continually call FindNextStreamW and yield the resulting stream name until FindNextStreamW returns false. When that happens, I double check the last error value to make sure that the iteration stopped because FindNextStreamW ran out of streams, and not for some unexpected reason.

Unfortunately, if you're using Windows XP or Windows 2000 Server, these functions aren't available to you, but there are a couple of alternatives. The first solution involves an undocumented function currently exported from Kernel32.dll, NTQueryInformationFile. However, undocumented functions are undocumented for a reason, and they can be changed or even removed at any time in the future. It's best not to use them. If you do want to use this function, search the Web and you'll find plenty of references and sample source code. But do so at your own risk.

Another solution, and one which I've demonstrated in Figure2, relies on two functions exported from Kernel32.dll, and these are documented. As their names imply, BackupRead and BackupSeek are part of the Win32 API for backup support:

BOOL BackupRead(HANDLE hFile, LPBYTE lpBuffer,
DWORD nNumberOfBytesToRead,
LPDWORD lpNumberOfBytesRead, BOOL bAbort,
BOOL bProcessSecurity, LPVOID* lpContext);

BOOL BackupSeek(HANDLE hFile, DWORD dwLowBytesToSeek,
DWORD dwHighBytesToSeek, LPDWORD lpdwLowByteSeeked,
LPDWORD lpdwHighByteSeeked, LPVOID* lpContext);

The idea behind BackupRead is that it can be used to read data from a file into a buffer, which can then be written to the backup storage medium. However, BackupRead is also very handy for finding out information about each of the Alternate Data Streams that make up the target file. It processes all of the data in the file as a series of discrete byte streams (each Alternate Data Stream is one of these byte streams), and each of the streams is preceded by a WIN32_STREAM_ID structure. Thus, in order to enumerate all of the streams, you simply need to read through all of these WIN32_STREAM_ID structures from the beginning of each stream (this is where BackupSeek becomes very handy, as it can be used to jump from stream to stream without having to read through all of the data in the file).

To begin, you first need to create a managed counterpart for the unmanaged WIN32_STREAM_ID structure:

typedef struct _WIN32_STREAM_ID
{
DWORD dwStreamId;
DWORD dwStreamAttributes;
LARGE_INTEGER Size;
DWORD dwStreamNameSize;
WCHAR cStreamName[ANYSIZE_ARRAY];
}
WIN32_STREAM_ID;

For the most part, this is like any other structure you'd marshal through P/Invoke. However, there are a few complications. First and foremost, WIN32_STREAM_ID is a variable-sized structure. Its last member, cStreamName, is an array with length ANYSIZE_ARRAY. While ANYSIZE_ARRAY is defined to be 1, cStreamName is just the address of the rest of the data in the structure after the previous four fields, which means that if the structure is allocated to be larger than sizeof (WIN32_STREAM_ID) bytes, that extra space will in effect be part of the cStreamName array. The previous field, dwStreamNameSize, specifies exactly how long the array is.

While this is great for Win32 development, it wreaks havoc on a marshaler that needs to copy this data from unmanaged memory to managed memory as part of the interop call to BackupRead. How does the marshaler know how big the WIN32_STREAM_ID structure actually is, given that it's variable sized? It doesn't.

The second problem has to do with packing and alignment. Ignoring cStreamName for a moment, consider the following possibility for your managed WIN32_STREAM_ID counterpart:

[StructLayout(LayoutKind.Sequential)]
public struct Win32StreamID
{
public int dwStreamId;
public int dwStreamAttributes;
public long Size;
public int dwStreamNameSize;
}

An Int32 is 4 bytes in size and an Int64 is 8 bytes. Thus, you would expect this struct to be 20 bytes. However, if you run the following code, you'll find that both values are 24, not 20:

int size1 = Marshal.SizeOf(typeof(Win32StreamID));
int size2 = sizeof(Win32StreamID); // in an unsafe context

The issue is that the compiler wants to make sure that the values within these structures are always aligned on the proper boundary. Four-byte values should be at addresses divisible by 4, 8-byte values should be at boundaries divisible by 8, and so on. Now imagine what would happen if you were to create an array of Win32StreamID structures. All of the fields in the first instance of the array would be properly aligned. For example, since the Size field follows two 32-bit integers, it would be 8 bytes from the start of the array, perfect for an 8-byte value. However, if the structure were 20-bytes in size, the second instance in the array would not have all of its members properly aligned. The integer values would all be fine, but the long value would be 28 bytes from the start of the array, a value not evenly divisible by 8. To fix this, the compiler pads the structure to a size of 24, such that all of the fields will always be properly aligned (assuming the array itself is).

If the compiler's doing the right thing, you might be wondering why I'm concerned about this. You'll see why if you look at the code in Figure2. In order to get around the first marshaling issue I described, I do in fact leave the cStreamName out of the Win32StreamID structure. I use BackupRead to read in enough bytes to fill my Win32StreamID structure, and then I examine the structure's dwStreamNameSize field. Now that I know how long the name is, I can use BackupRead again to read in the string's value from the file. That's all well and dandy, but if Marshal.SizeOf returns 24 for my Win32StreamID structure instead of 20, I'll be attempting to read too much data.

To avoid this, I need to make sure that the size of Win32StreamID is in fact 20 and not 24. This can be accomplished in two different ways using fields on the StructLayoutAttribute that adorns the structure. The first is to use the Size field, which dictates to the runtime exactly how big the structure should be:

[StructLayout(LayoutKind.Sequential, Size = 20)]

The second option is to use the Pack field. Pack indicates the packing size that should be used when the LayoutKind.Sequential value is specified and controls the alignment of the fields within the structure. The default packing size for a managed structure is 8. If I change that to 4, I get the 20-byte structure I'm looking for (and as I'm not actually using this in an array, I don't lose efficiency or stability that might result from such a packing change):

[StructLayout(LayoutKind.Sequential, Pack=4)]
public struct Win32StreamID
{
public StreamType dwStreamId;
public int dwStreamAttributes;
public long Size;
public int dwStreamNameSize;
// WCHAR cStreamName[1];
}

With this code in place, I can now enumerate all of the streams in a file, as shown here:

static void Main(string[] args) {
foreach (string path in args) {
Console.WriteLine(path + ":");
foreach (StreamInfo stream in
FileStreamSearcher.GetStreams(new FileInfo(path)))
{
Console.WriteLine("\t{0}\t{1}\t{2}",
stream.Name != null ? stream.Name : "(unnamed)",
stream.Type, stream.Size);
}
}
}

You'll notice that this version of FileStreamSearcher returns more information than the version that uses FindFirstStreamW and FindNextStreamW. BackupRead can provide data on more than just the primary stream and Alternate Data Streams, also operating on streams containing security information, reparse data, and more. If you only want to see the Alternate Data Streams, you can filter based on the StreamInfo's Type property, which will be StreamType.AlternateData for Alternate Data Streams.

To test this code, you can create a file that has Alternate Data Streams using the echo command at the command prompt:

> echo ".NET Matters" > C:\test.txt
> echo "MSDN Magazine" > C:\test.txt:magStream
> StreamEnumerator.exe C:\test.txt
test.txt:
(unnamed) SecurityData 164
(unnamed) Data 17
:magStream:$DATA AlternateData 18
> type C:\test.txt
".NET Matters"
> more < C:\test.txt:magStream
"MSDN Magazine"

So, now you're able to retrieve the names of all Alternate Data Streams stored in a file. Great. But what if you want to actually manipulate the data in one of those streams? Unfortunately, if you attempt to pass a path for an Alternate Data Stream to one of the FileStream constructors, a NotSupportedException will be thrown: "The given path's format is not supported."

To get around this, you can bypass FileStream's path canonicalization checks by directly accessing the CreateFile function exposed from kernel32.dll (see Figure3). I've used a P/Invoke for the CreateFile function to open and retrieve a SafeFileHandle for the specified path, without performing any of the managed permission checks on the path, so it can include Alternate Data Stream identifiers. This SafeFileHandle is then used to create a new managed FileStream, providing the required access. With that in place, it's easy to manipulate the contents of an Alternate Data Stream using the System.IO namespace's functionality. The following example reads and prints out the contents of the C:\test.txt:magStream created in the previous example:

string path = @"C:\test.txt:magStream";
using (StreamReader reader = new StreamReader(CreateFileStream(
path, FileAccess.Read, FileMode.Open, FileShare.Read)))
{
Console.WriteLine(reader.ReadToEnd());
}


Send your questions and comments to netqa@microsoft.com.
Posted by 퓨전마법사
,