MSDOS and Win31 versions of MSBackUp

Introduction

This page is part of a larger, more complete work on Microsoft's Win32 version of MSBackUp. I was introduced to the MSDOS version in Oct, 2007 by an email from Ray Beck. It turns out Microsoft introduced versions of MSBackUp with their MSDOS 6.x series along with a Win31 version, MWBackUp. I believe they both supported the same file formats, but in true Microsoft traditon were not only incompatible with earlier versions of MSDOS BackUp but are also incompatible with the later Win32 versions. In fact it seems if you turned on data compression when you made a backup in MSDOS 6.0 you may have to jump through some hoops to restore it in MSDOS 6.22 because of the switch from 'double space' to 'drive space' compression! I've only spent about a couple man weeks on this and doubt I will take this project any further. Hopefully anyone who had problems with data recovery from these files has either found a solution or given up! I started with a preliminary version of a program that seems to extract uncompressed files including the embedded catalog file from a floppy disk set. I took a quick look at dealing with the data decompression and thought it looked like a major task. I've since found GPL Linux source code for the DMSDOS Driver which seems to do the job. I ported this to my Beta application code and it now seems to handle data decompression and both backups made to the hard disk and floppy based backup disk sets. Contact me if you find a bug, or if what you learn here isn't enough. Maybe I can help although I've put almost everything I know (or believe) on this page.

MSBackUp for MSDOS

I found a forum thread discussing the problems people have had restoring files of this type on later OS. The last message in this thread dated Nov 25, 2001 suggests you can download stepup.exe from Microsoft, install it, and get either an MSDOS or Win31 version of MSBackup to restore files in this format. My understanding is that this is only true if you already have a DOS 6.x version installed which Stepup.exe can upgrade. I do not believe you can install this from scratch, nor did I have any success expanding the individual MSBackup files from the archives using the expand program from DOS 5.0 (I was able to get *.exe and *.ovl files but they crashed with an exception error when run). I was fortunate enough to have a legal DOS 6.22 installed on one of the old machines in my basement so I have been able to do some test work.

For more information on the MSBackup application look at the DOS 6.22 Help files which someone was kind enough to make available on-line. In particular I recommend the notes file which discusses both the file naming conventions and the potential incompatibilty in the compression algorithms between MSDOS 6.x versions.

My MSDOS 6.22 version of MSBackUp must be installed on a hard drive and by default wants to backup to a floppy. After some expairmenting I find there is an option to backup to any drive disk. It appears to creating a slightly different file, Ver 2a, when backing up to a hard drive. When writting to a floppy the program creates one or more Ver 1E diskettes containing the data. Each disks contains one file with a base name reflecting the date and file extensions numbered consecutively starting with *.001 to indicate there possition in the series. Each of these backup files fills the diskette (ie is the only file listed in the directory. When initially installed on the hard drive the install directory contains at least the following:

MSBACKUP EXE         5,506  05-31-94  6:22a MSBACKUP.EXE
MSBACKUP HLP       314,236  05-31-94  6:22a MSBACKUP.HLP
MSBACKUP OVL       133,936  05-31-94  6:22a MSBACKUP.OVL
MSBACKDB OVL        63,994  05-31-94  6:22a MSBACKDB.OVL
MSBACKDR OVL        68,074  05-31-94  6:22a MSBACKDR.OVL
MSBACKFB OVL        69,530  05-31-94  6:22a MSBACKFB.OVL
MSBACKFR OVL        73,706  05-31-94  6:22a MSBACKFR.OVL
MSBCONFG HLP        45,780  05-31-94  6:22a MSBCONFG.HLP
MSBCONFG OVL        47,210  05-31-94  6:22a MSBCONFG.OVL
The following were created on my machine after runing 
MSBackup the 1st time and completing the configuration routine:
MSBACKUP INI            43  11-11-06  8:21p MSBACKUP.INI
DEFAULT  SET         4,194  11-11-06  8:21p DEFAULT.SET
DEFAULT  SLT            48  11-11-06  7:30p DEFAULT.SLT
MSBACKUP TMP         5,021  11-11-06  8:19p MSBACKUP.TMP
DEFAULT  CAT            66  11-11-06 10:09p DEFAULT.CAT
DEFAULT  SAV            64  11-11-06  4:47p DEFAULT.SAV
MSBACKUP RST           608  11-11-06  8:19p MSBACKUP.RST
In the process of configuration you define the available disk drives on the system and it tests them saving the results in DEFAULT.SET. This file also appears to contain time stamps for the last backups. The configuration suggests a test backup which I preformed, creating a two diskette backup of some of the install files. If one aborts the configuration/test backup when it pauses before the verification phase you can view its catalog (*.FUL) file, but if you complete the verification this catalog is deleted. After this, when a new backup is done a new catalog file will be left in the install directory and can be used for data recovery. Note the implication is the system was designed to restore backups on the original machine, not somewhere else. The help files indicate MSBackUp is capable of recovering the catalog from the end of a backup file (in the case of a multiple floppy backup this is the last file in the backup set) and this seems to work with most backups, but oddly not the configuration backup set. Thus if you have a complete backup file (or set of files) you can always extract the original catalog, place it in an install directory on a new hard drive and restore files.

If you need to restore files from one of these archives I strongly recommend you get a version of MSDOS 6.22 and run MSBACKUP.exe. As of October 2007 you can download an MSDOS 6.x boot disk and the MSDOS MSBACKUP programs. Install the programs on your hard disk, then boot MSDOS 6.22 from the floppy. To deal with data compression I believe you will need to have the correct compression driver on the boot disk. Have yet to check this out, but it maybe included on the boot disk ! The test data files I created were done with the msbackup.exe files that came with my DOS 6.22 distribution so I have not actually attempted the downloads and installation I suggest above. Luck.

As proof of concept I've written a BETA version of a console program which can parse backups created with MSBackup distributed with Microsoft's MSDOS 6.x. I doubt I will take this any further, it seems to list the contents of backup archive and catalog files. It has not been extensively tested, and has a fairly primitive command line interface. It works on a file by file basis. MSBackUp allowed one to write the backup to floppies, and often more than one diskette was required for the backup. In these cases, an individual file's data might be split between two (or more) diskettes. For a floppy based backup you can tell by listing the individual disks (or backup files) if the last extractable file on one disk is continued on the next by listing the next disk. In the case where there is a continuation you must have both backup files available to extract your target file. Target files are extracted to the current directory, no attempt to recreate directories nor set the extracted files time stamp or attributes is made. Initially I wrote it to extract one file at a time with the -x command. I later added a -a option to extract all files in a single backup data file or optionally for all files from one directory in the data file. There is potential for overwriting things here if you have a backup with duplicate names in different directories. You can use the -o# option with -x to control which occurance of a file name is extracted if you have duplicate names (directory paths are ignored with -x). There is also a -c option which allows this program to list the contents of a backup's catalog file, but you can't extract directy from a catalog file. An MSDOS executable and the source code are available in a self expanding LHA archive, nortbk4.exe. The archive also contains this file, dosbkup.htm, which is the only documentation you get. Use it at your own risk! LHA is a freely available archive tool for MSDOS. If you are really interested in the internals I've discovered for the version of MSBackUp, looking at the structures I define in the source code is useful. This is a stand alone program that compiles with Microsoft's C compilers QC2.5 and CL ver 5.0. I'm sure it would port to other OS with minimal work, but I'm not sure how useful it really is except as proof of concept.

Norton Ver 1E Backup Format Details

I'm only talking about backup files with the string "NORTON ver 1E" starting at the 2nd byte at the beginning of the file. I believe this is the format created when running an MSDOS 6.x version of MSBackUp and writing to the floppy disk drive. See also the associated Ver 2A discussion about files created when writting to a hard disk. The internal structures and data compression used in Ver 2A seems to be the same, but the data is organized differently and the single data file that is generated. The catalog files seem to be identical. I have also looked briefly at MSDOS 5.0 backup which creates files with yet another format.

In most of my tests I was using a 1.44Mb disk drive, but I did do some work with 720Kb and 360Kb drives. The blocking factors and offset to the start of the data region vary with disk size, but the general file layout was the same for all my tests. Doing these tests allowed me to identify some of the header fields associated with drive capacity. MSBackUp appears to customize the diskette boot sector slightly such that if one tries to boot the disk you get the following message:

   Non System Diskette
   Microsoft Backup Diskette
   Replace and press any key when ready
Following the boot sector there are two standard FAT tables and then a standard directory sector which contains two entries, the volume lable and the backup data file which includes the majority of the disk and is saved in logical sector order. On my 1.44Mb test disks the disk directory starts at logical sector 19 (logical byte offset 0x2600) and the data file starts at logical sector 32 (logical byte offset 0x4000). I am a little surprised at the starting position of the data file as the boot sector implies the directory section is 224 sectors long, but both DOS and Win9x are perfectly happy copying the data file off the diskettes so I guess its fine.

All the information of interest is in the data file so for the remainder of this discussion the only offsets mentioned are with respect to the start of the data file itself.

In overview I have identified four different regions in these data files. The backup files start with at least 3 identical 0x200 byte headers. On a 1.44Mb floppy the catalog (file directory) section starts at 0x600. This varies some with disk capacity, but this catalog region always begins immediately after the last valid file header. The catalog region is followed by the data region. On a floppy based backup this data region is broken up into fixed length blocks. Each block contains some backup data followed by some binary data that frankly I don't understand. This mystery data looks like it may map disk useage in a mannor similar to a FAT table, but it is not yet clear. I am able to restore my test data without any refference to this mystery region so I've ignored it. On a 1.44Mb diskette the main data area starts at offset 0x4E00, each block is a total of 0x4800 bytes of which 0x800 are mystery date (ie skipped over during my restore). Smaller disks, 720kb and 360kb seem to use blocks which are half this size. I've identified a flag in the header region which appears to reflect this difference. Curiously when data is written to the disk MSBackUp starts with the last block (highest logical sector on the disk), and steps backwards through the available blocks as data is written. It makes no attempt to clear pre-existing data on a diskette so the early blocks on a diskette that is not full (typically the last in a disk set) will contain whatever was there from prior operations on the disk. I outline the structures and fields I've identified so far below. Its enough to extract data, but clearly not a full specification. I may have guessed wrong in some cases, particuarly the exact starting offset of some of the string data. I never did learn how to control the description fields, so mine are always '.DEFAULT.'. For a little more detail than is presented below see the structures and comments in the source code. The file header of length 0x200 bytes contains the following:

offset   use
   1     version string: "NORTON Ver 1E"
0x10     binary data, possibly target disk: heads, tracks, sectors
0x1C     pretty clearly a timestamp, but the format is unclear
0x22     two key words: # files in backup, # directories on source disk
0x26     dword: appears to be length of the data region
0x31     byte: blocking factor flag, compression flag
0x34     dword: file length
0x41     string: I've seen "DEFAULT" and "CONFID$$"
0x50     string: name of catalog file, 1st 8 bytes normally match files
0x60     encrypted password string (only if password protected)
0x70     binary data, sparsely populated region, mostly zeros
0xC2     string: "Version 1.0 for Microsoft"
0xE1     string: Description, I've seen two
            "(No Description)"
            "Compatibility Setup File"
0x100    binary data, almost all zeros
0x1fe    clearly a check sum value, but I don't know how its created
So far there have always been at least 3 of these headers starting at or near the beginning of the file as outline below. With more experimentation one could probably identify more of these fields. When the target media is a floppy disk the data from 0x12 to 0x1c appears to describe this media, my tracks field always matched the number of tracks on the floppy with Ver 1E backups. This is not true for Ver 2A backups, tracks appears to be proportional to the size of the backup. Although not critical as one doesn't need these fields to extract data, its an example of an early assumption I made based on the floppy backups that doesn't extend to hard disk backups. If you play with this and learn more please let me know.

As mentioned above two files are created when a backup is done, the backup data file(s) and the backup catalog file which summarizes what was backed up. The catalog file is appended to the end of the data file and can be recovered if it is lost. I use the terms 'data file' and 'catalog file' to differentiate between these. Each contains a mapping of the disk structure, but in slightly different order. There are three key structures which describe the directories and files on the hard disk being backed up. Both the catalog file and data file use the same structure to describe the hard disk directories, but the structures used to describe an individual file have minnor differences. All three of these structures are 0x20 bytes long. As you will see if you look at the source code I understand about 75% of the fields in these structures. Enough to do a listing or file extraction, but not the full story.

In the discussion below a WORD is two bytes, a DWORD is four bytes.

Data File structure describing an individual file

BYTE - always zero (this distinguishes it from a directory entry
BYTE name[11] - file name (padded with spaces) & extension, no '.'
BYTE attribute byte (probably!) seems to map to MSDOS attribute
BYTE continuation flag: 0 => first occurance, > 0 implies continuation
BYTE data file #, maps to data files extension 
BYTE compression flag, 0 => not compressed, 0xA may be (see below)
WORD start offset into data block in data file
WORD start block # in data file (0 is last block, > 0 closer to beginning)
WORD unknown  (often maybe always 0)
WORD time file created in MSDOS format
WORD date file created in MSDOS format
WORD unknown
DWORD length of file on disk and in backup file if uncompressed

Catalog File structure describing an individual file

Same as above for first 0x10 and last 0xA bytes
6 bytes starting at offset 0x10 are different as indicated below:
DWORD unknown  (could be any combination of 4 BYTES)
WORD  unknown
... remaining bytes same as above.

Per above the WORDs representing the files date and time stamp are in exactly the same format as occurs in an MSDOS directory. I would have expected this for the directory time stamps also, but it doesn't seem to be that way!

Structure describing a directory
This is used in both the Data and Catalog files

BYTE name[11] - file name (padded with spaces) & extension, no '.'
BYTE level - directory nesting depth, 0 => , 1 => sub dir in root etc
BYTE unknown[8]  no idea!
WORD  # of files backed up from this directory
WORD  # of files backed up (same as word above???)
BYTE unknow[8]   no idea!
BYTE time[4]   appears to be a time stamp, but if so in weird format

I use byte offset 0x16 into the structure that describes a directory entry as the WORD = number of files backed up from this directory. Its zero unless one of more files from this directory are included in the backup. It appears to have the same value as the preceeding WORD which seems odd! I'm not sure about anything except this WORD and the first 0xC bytes which are the directory name and its nesting level. However as indicated above the last four bytes starting at offset 0x1C into the structure appear to be a time stamp. Its NOT in the same format as the file date and time fields. It looks a bit like the DWORD number of seconds since June 1968 but not only would that be a pretty weird standard, it doesn't exactly match the time stamp of the directories displayed by MSDOS for the hard disk, but its close.

Ver 1E Data File Layout

In my 1.44Mb diskette examples the data file directory begins at offset 0x600 immediately after the file headers. As mentioned above this varies a little with different sized disks. On a 360kb diskette there were 4 headers starting at 0x400 with the data file directory starting at offset 0xc00. In all cases the file's directory region started immediately after the last file header.

The data directory begins with one or more directory entries, ie 0x20 byte structures with a non-zero value as the first byte which is the beginning of the directory name. The first is always the root directory of the hard drive being backed up, eg "C:\". As one steps through this region one finds either directory or file entries in the order they would occur if one did a depth first search of the disk. All directories on the disk are included, but only the files which have been backed up are shown. The data file directory is terminated by an empty structure whose first byte is 0xff (an invalid file name character). The nesting depth field is useful for displaying the tree structure, my program indents the listing based on the directory nesting level. If a directory entry has a non-zero entry for the number of files backed up, these file entries will immediately follow the directory entry and in my listing they are preceeded by the 'file:' string. Note on a multi disk backup set each successive disk file just picks up the prior where it ended. It still starts with enough directory structures to determine the path to the files on the disk, but then continues with file structures. The data for the last file on one disk often spans over to the next disk which is indicated by the continuation number in the files catalog entry.

During extraction one can generate the paths to the files from this information, and I think I have this implemented now. If a file is included in this directory list it is extractable. One uses the block number and offset into the block to get the starting location for extraction. If the compression flag is 0, its just raw data and may be copied directly from the archive to the destination file. The only other flag I've seen is 0xA. When the compression flag is 0xA, the file data is preceeded by 3 byte headers. These headers contain a BYTE flag followed by a WORD length, see struct nxt_loc. The length is the offset from the start of this structure to the end of the data in this segment (and often to the next 3 byte header). I've see flag values of 0 and 0xff in what I've looked at so far. It seems a flag of 0 is compressed data and a flag of 0xff is raw data. I don't know why one would encapsulate raw data between these headers, but that is what it looks like. All the files archived by the configuration validation test are done this way, but it may be an exception. Any occurrances of a nxt_loc.flag = 0 mean the data is compressed as discussed below. The algorithm I use to extract this data is a little messy as I find that one needs to have read in all the nxt_loc.len bytes before they can be decompressed. This requires that one may have to switch to another data block in the middle of the read operation.

Norton Ver 2A Backup Format Details

I'm talking about backup files with the string "NORTON ver 2A" starting at the 2nd byte at the beginning of the file. These seem to be generated by MSDOS 6.22 MSBackUp when the backup is written to a hard disk. The data structures such as the header and 0x20 byte directory structures are identical to those listed above for ver 1E. The file header itself is the same except for the difference in version number. The backup files start with 3 headers which are immediately followed at offset 0x600 by the data region. It appears that all ver 2A files have 0x4800 byte data blocks, but they are all data without any mystery binary data which would have to be skipped. The mystery binary data still seems to be in the file, but it occurs after the data region and is all contiguous. The blocks are also laid down in logical sector order from first to last so it is much easier to read them, one just does repeated read commands without any need to reposition the file pointer to a new block. The data directory region which immediately follows the headers in a ver 1E backup comes at the end of the file in a ver 2A backup. I find the directory region always begins on a data block boundry. I use the file length from the file header to position at the beginning of the last block in the file, then test to see if it contains the directory entry for a hard drive, eg "C:\", "D:\", etc. If not I backup a block and check again. This seems to work, and after locating this directory region the backup can be listed and files can be extracted. The organization of this directory region is the same as the ver 1E backups, its just located in a different place.

Catalog File Layout

The catalog file starts with all the entries representing the files that have been backed up, ie entries with the first byte = 0. Its followed by all the directory entries in a depth first search order for the hard disk. On can work out which files are associated with which directory using the value of the number of files backed up in each directory. The first directory found with a non-zero file count is associate with the first count files at the start of the catalog file. The next non-zero entry associates the next count files with that directory, etc. There is an entry in the structure for a file that indicates the disk number in a multi disk set where the file begins.

The catalog file ends with a fixed length 0x100 byte header. I validate the catalog files by seeking to the end of the file, backing up 0x100 bytes and reading in the header. I save the position of this header to terminate parsing of the catalog file. Unlike the directory region in a data file, there appears to be no termination flag in a catalog file. At offset 0x60 in the header you should see the descriptive string: "Version 6.00.00 02/26/93 06:00 am". This is followed by the name of the catalog file and its timestamp as a string. I have not investigated the other fields as I can parse this file well enough with the information I have.

Both ver 1E and 2A backups include a copy of the catalog file. Its archived under a directory entry at the end of the listing for the directory in which MSBackUp is installed. It can be extracted using my nortbk.exe or MSBackUp. Its not really necessary if you are doing your extraction with nortbk.exe, but of interest.

DMSDOS Decompression: Drive Space & Double Space

I really didn't have the energy or smarts to try to second guess how the data compression was being done. I was able to create a WIN32 decompression routine for MSQIC because the algorithm was well documented. In initial searches I found next to nothing about the two methods used in MSDOS 6.x MSBackup, 'double space' and 'drive space' compression. Apparently the actual data compression for MSBackUp is done by one of these two drivers although I don't know the interface details. If you are running MSDOS 6.22 and want to restore files from MSDOS 6.0 or 6.2 you should read the article Cannot-Restore-Backup.

Ultimately I found a linux 2.0 kernel module, dmsdos, was written to access compressed MSDOS drives. The source code is available, and the documentation has a nice summary of the compression algorithms the author has seen. This driver source is GPL code so presumably I can lift some of it without objection. It turned out that it was pretty easy to modify the dblspace_dec.c module so I could call it to decompress Ver 1E and 2A MSBackUp files. My samples all use the 'drive space' algorithm and the compressed segments begin with the "JM" version 0.0 sequence, but I suspect the 'double space' routine (which is also included) will work as well. If file contains compressed data I extract each complete data segment to a buffer, then call the decompression routine and write out the result. I really only have a general idea of what its doing, but I've extracted both text files and executables successfully.

My thanks go out to the various authors of this great work.

I only have a few comments of my own about decompression that aren't directly covered by the authors documentation. The linux kernel module above was designed to decompress disk sectors, ie it expects files to be in 0x200 byte blocks. I added a little code to allow variable file lengths. The source code assumes the caller knows how many bytes are to be expanded from the compressed code. I do not see any fields in the file headers that provide the number of bytes to be decompressed, only the number of compressed bytes in the current block. By trial and error I determined that the program seems to use an 0x5FFD byte buffer size for decompression. This means that one cause use the output file length (which is readily available) and this buffer size to pass an output length to the decompression routine. I track the # of bytes still to be decompressed as I do the decompression. If the remaining length is > 0x5FFD one decompresses a full buffer's worth of data, otherwise one decompresses the remaining length bytes. This works well in the tests I have done, but depends on a buffer size I determined by expairmentation.

Another point of interest is that you will find not all files in a backup with data compression enabled are actually compressed. This information is in the files directory region and displayed by nortbk.exe. It appears that short files are not compressed. MSBackUp also seems to do some simple testing as it doesn't seem to compress files which have already been compressed. My test files contained several *.lzh compressed archive in the data area. MSBackUp does not attempt to re-compress these files. I didn't have any *.zip or *.gz compressed file but expect the same result. Text and executable files are normally compressed unless they are very short.

DOS 5.x Backup Program

I've done very little with this, just wanted to compare it to the MSDOS 6.x format. I generated a small text backup on an MSDOS 5.0 machine by running BACKUP. This created two files, BACKUP.001 and CONTROL.001. BACKUP.001 appears to be just the concatonation of the files, while CONTROL.001 appears to be a catalog of the files with their length and the starting offset in BACKUP.001. The FreeDos Project has a nice page that talks about the pre 6.x versions of BACKUP. It claims the MSDOS 6.22 version of RESTORE is compatible with backups from DOS versions 2.x-5.x. < a name="HISTORY">

Page History