For more information on the MSBackup application look at the DOS 6.22 Help files which someone was kind enough to make available on-line. In particular I recommend the notes file which discusses both the file naming conventions and the potential incompatibilty in the compression algorithms between MSDOS 6.x versions.
MSBACKUP EXE 5,506 05-31-94 6:22a MSBACKUP.EXE MSBACKUP HLP 314,236 05-31-94 6:22a MSBACKUP.HLP MSBACKUP OVL 133,936 05-31-94 6:22a MSBACKUP.OVL MSBACKDB OVL 63,994 05-31-94 6:22a MSBACKDB.OVL MSBACKDR OVL 68,074 05-31-94 6:22a MSBACKDR.OVL MSBACKFB OVL 69,530 05-31-94 6:22a MSBACKFB.OVL MSBACKFR OVL 73,706 05-31-94 6:22a MSBACKFR.OVL MSBCONFG HLP 45,780 05-31-94 6:22a MSBCONFG.HLP MSBCONFG OVL 47,210 05-31-94 6:22a MSBCONFG.OVL The following were created on my machine after runing MSBackup the 1st time and completing the configuration routine: MSBACKUP INI 43 11-11-06 8:21p MSBACKUP.INI DEFAULT SET 4,194 11-11-06 8:21p DEFAULT.SET DEFAULT SLT 48 11-11-06 7:30p DEFAULT.SLT MSBACKUP TMP 5,021 11-11-06 8:19p MSBACKUP.TMP DEFAULT CAT 66 11-11-06 10:09p DEFAULT.CAT DEFAULT SAV 64 11-11-06 4:47p DEFAULT.SAV MSBACKUP RST 608 11-11-06 8:19p MSBACKUP.RSTIn the process of configuration you define the available disk drives on the system and it tests them saving the results in DEFAULT.SET. This file also appears to contain time stamps for the last backups. The configuration suggests a test backup which I preformed, creating a two diskette backup of some of the install files. If one aborts the configuration/test backup when it pauses before the verification phase you can view its catalog (*.FUL) file, but if you complete the verification this catalog is deleted. After this, when a new backup is done a new catalog file will be left in the install directory and can be used for data recovery. Note the implication is the system was designed to restore backups on the original machine, not somewhere else. The help files indicate MSBackUp is capable of recovering the catalog from the end of a backup file (in the case of a multiple floppy backup this is the last file in the backup set) and this seems to work with most backups, but oddly not the configuration backup set. Thus if you have a complete backup file (or set of files) you can always extract the original catalog, place it in an install directory on a new hard drive and restore files.
If you need to restore files from one of these archives I strongly recommend you get a version of MSDOS 6.22 and run MSBACKUP.exe. As of October 2007 you can download an MSDOS 6.x boot disk and the MSDOS MSBACKUP programs. Install the programs on your hard disk, then boot MSDOS 6.22 from the floppy. To deal with data compression I believe you will need to have the correct compression driver on the boot disk. Have yet to check this out, but it maybe included on the boot disk ! The test data files I created were done with the msbackup.exe files that came with my DOS 6.22 distribution so I have not actually attempted the downloads and installation I suggest above. Luck.
As proof of concept I've written a BETA version of a console program which can parse backups created with MSBackup distributed with Microsoft's MSDOS 6.x. I doubt I will take this any further, it seems to list the contents of backup archive and catalog files. It has not been extensively tested, and has a fairly primitive command line interface. It works on a file by file basis. MSBackUp allowed one to write the backup to floppies, and often more than one diskette was required for the backup. In these cases, an individual file's data might be split between two (or more) diskettes. For a floppy based backup you can tell by listing the individual disks (or backup files) if the last extractable file on one disk is continued on the next by listing the next disk. In the case where there is a continuation you must have both backup files available to extract your target file. Target files are extracted to the current directory, no attempt to recreate directories nor set the extracted files time stamp or attributes is made. Initially I wrote it to extract one file at a time with the -x command. I later added a -a option to extract all files in a single backup data file or optionally for all files from one directory in the data file. There is potential for overwriting things here if you have a backup with duplicate names in different directories. You can use the -o# option with -x to control which occurance of a file name is extracted if you have duplicate names (directory paths are ignored with -x). There is also a -c option which allows this program to list the contents of a backup's catalog file, but you can't extract directy from a catalog file. An MSDOS executable and the source code are available in a self expanding LHA archive, nortbk4.exe. The archive also contains this file, dosbkup.htm, which is the only documentation you get. Use it at your own risk! LHA is a freely available archive tool for MSDOS. If you are really interested in the internals I've discovered for the version of MSBackUp, looking at the structures I define in the source code is useful. This is a stand alone program that compiles with Microsoft's C compilers QC2.5 and CL ver 5.0. I'm sure it would port to other OS with minimal work, but I'm not sure how useful it really is except as proof of concept.
In most of my tests I was using a 1.44Mb disk drive, but I did do some
work with 720Kb and 360Kb drives. The blocking factors and offset
to the start of the data region vary with disk size, but the general
file layout was the same for all my tests. Doing these tests allowed
me to identify some of the header fields associated with drive capacity.
MSBackUp appears to customize the diskette boot sector slightly such
that if one tries to boot the disk you get the following message:
Non System Diskette Microsoft Backup Diskette Replace and press any key when readyFollowing the boot sector there are two standard FAT tables and then a standard directory sector which contains two entries, the volume lable and the backup data file which includes the majority of the disk and is saved in logical sector order. On my 1.44Mb test disks the disk directory starts at logical sector 19 (logical byte offset 0x2600) and the data file starts at logical sector 32 (logical byte offset 0x4000). I am a little surprised at the starting position of the data file as the boot sector implies the directory section is 224 sectors long, but both DOS and Win9x are perfectly happy copying the data file off the diskettes so I guess its fine.
All the information of interest is in the data file so for the remainder of this discussion the only offsets mentioned are with respect to the start of the data file itself.
In overview I have identified four different regions in these data files. The backup files start with at least 3 identical 0x200 byte headers. On a 1.44Mb floppy the catalog (file directory) section starts at 0x600. This varies some with disk capacity, but this catalog region always begins immediately after the last valid file header. The catalog region is followed by the data region. On a floppy based backup this data region is broken up into fixed length blocks. Each block contains some backup data followed by some binary data that frankly I don't understand. This mystery data looks like it may map disk useage in a mannor similar to a FAT table, but it is not yet clear. I am able to restore my test data without any refference to this mystery region so I've ignored it. On a 1.44Mb diskette the main data area starts at offset 0x4E00, each block is a total of 0x4800 bytes of which 0x800 are mystery date (ie skipped over during my restore). Smaller disks, 720kb and 360kb seem to use blocks which are half this size. I've identified a flag in the header region which appears to reflect this difference. Curiously when data is written to the disk MSBackUp starts with the last block (highest logical sector on the disk), and steps backwards through the available blocks as data is written. It makes no attempt to clear pre-existing data on a diskette so the early blocks on a diskette that is not full (typically the last in a disk set) will contain whatever was there from prior operations on the disk. I outline the structures and fields I've identified so far below. Its enough to extract data, but clearly not a full specification. I may have guessed wrong in some cases, particuarly the exact starting offset of some of the string data. I never did learn how to control the description fields, so mine are always '.DEFAULT.'. For a little more detail than is presented below see the structures and comments in the source code. The file header of length 0x200 bytes contains the following:
offset use 1 version string: "NORTON Ver 1E" 0x10 binary data, possibly target disk: heads, tracks, sectors 0x1C pretty clearly a timestamp, but the format is unclear 0x22 two key words: # files in backup, # directories on source disk 0x26 dword: appears to be length of the data region 0x31 byte: blocking factor flag, compression flag 0x34 dword: file length 0x41 string: I've seen "DEFAULT" and "CONFID$$" 0x50 string: name of catalog file, 1st 8 bytes normally match files 0x60 encrypted password string (only if password protected) 0x70 binary data, sparsely populated region, mostly zeros 0xC2 string: "Version 1.0 for Microsoft" 0xE1 string: Description, I've seen two "(No Description)" "Compatibility Setup File" 0x100 binary data, almost all zeros 0x1fe clearly a check sum value, but I don't know how its createdSo far there have always been at least 3 of these headers starting at or near the beginning of the file as outline below. With more experimentation one could probably identify more of these fields. When the target media is a floppy disk the data from 0x12 to 0x1c appears to describe this media, my tracks field always matched the number of tracks on the floppy with Ver 1E backups. This is not true for Ver 2A backups, tracks appears to be proportional to the size of the backup. Although not critical as one doesn't need these fields to extract data, its an example of an early assumption I made based on the floppy backups that doesn't extend to hard disk backups. If you play with this and learn more please let me know.
As mentioned above two files are created when a backup is done, the backup data file(s) and the backup catalog file which summarizes what was backed up. The catalog file is appended to the end of the data file and can be recovered if it is lost. I use the terms 'data file' and 'catalog file' to differentiate between these. Each contains a mapping of the disk structure, but in slightly different order. There are three key structures which describe the directories and files on the hard disk being backed up. Both the catalog file and data file use the same structure to describe the hard disk directories, but the structures used to describe an individual file have minnor differences. All three of these structures are 0x20 bytes long. As you will see if you look at the source code I understand about 75% of the fields in these structures. Enough to do a listing or file extraction, but not the full story.
In the discussion below a WORD is two bytes, a DWORD is four bytes.
Data File structure describing an individual file
BYTE - always zero (this distinguishes it from a directory entry BYTE name[11] - file name (padded with spaces) & extension, no '.' BYTE attribute byte (probably!) seems to map to MSDOS attribute BYTE continuation flag: 0 => first occurance, > 0 implies continuation BYTE data file #, maps to data files extension BYTE compression flag, 0 => not compressed, 0xA may be (see below) WORD start offset into data block in data file WORD start block # in data file (0 is last block, > 0 closer to beginning) WORD unknown (often maybe always 0) WORD time file created in MSDOS format WORD date file created in MSDOS format WORD unknown DWORD length of file on disk and in backup file if uncompressed
Catalog File structure describing an individual file
Same as above for first 0x10 and last 0xA bytes 6 bytes starting at offset 0x10 are different as indicated below: DWORD unknown (could be any combination of 4 BYTES) WORD unknown ... remaining bytes same as above.Per above the WORDs representing the files date and time stamp are in exactly the same format as occurs in an MSDOS directory. I would have expected this for the directory time stamps also, but it doesn't seem to be that way!
Structure describing a directory
This is used in both the Data and Catalog files
BYTE name[11] - file name (padded with spaces) & extension, no '.' BYTE level - directory nesting depth, 0 => , 1 => sub dir in root etc BYTE unknown[8] no idea! WORD # of files backed up from this directory WORD # of files backed up (same as word above???) BYTE unknow[8] no idea! BYTE time[4] appears to be a time stamp, but if so in weird format
I use byte offset 0x16 into the structure that describes a directory entry as the WORD = number of files backed up from this directory. Its zero unless one of more files from this directory are included in the backup. It appears to have the same value as the preceeding WORD which seems odd! I'm not sure about anything except this WORD and the first 0xC bytes which are the directory name and its nesting level. However as indicated above the last four bytes starting at offset 0x1C into the structure appear to be a time stamp. Its NOT in the same format as the file date and time fields. It looks a bit like the DWORD number of seconds since June 1968 but not only would that be a pretty weird standard, it doesn't exactly match the time stamp of the directories displayed by MSDOS for the hard disk, but its close.
The data directory begins with one or more directory entries, ie 0x20 byte structures with a non-zero value as the first byte which is the beginning of the directory name. The first is always the root directory of the hard drive being backed up, eg "C:\". As one steps through this region one finds either directory or file entries in the order they would occur if one did a depth first search of the disk. All directories on the disk are included, but only the files which have been backed up are shown. The data file directory is terminated by an empty structure whose first byte is 0xff (an invalid file name character). The nesting depth field is useful for displaying the tree structure, my program indents the listing based on the directory nesting level. If a directory entry has a non-zero entry for the number of files backed up, these file entries will immediately follow the directory entry and in my listing they are preceeded by the 'file:' string. Note on a multi disk backup set each successive disk file just picks up the prior where it ended. It still starts with enough directory structures to determine the path to the files on the disk, but then continues with file structures. The data for the last file on one disk often spans over to the next disk which is indicated by the continuation number in the files catalog entry.
During extraction one can generate the paths to the files from this information, and I think I have this implemented now. If a file is included in this directory list it is extractable. One uses the block number and offset into the block to get the starting location for extraction. If the compression flag is 0, its just raw data and may be copied directly from the archive to the destination file. The only other flag I've seen is 0xA. When the compression flag is 0xA, the file data is preceeded by 3 byte headers. These headers contain a BYTE flag followed by a WORD length, see struct nxt_loc. The length is the offset from the start of this structure to the end of the data in this segment (and often to the next 3 byte header). I've see flag values of 0 and 0xff in what I've looked at so far. It seems a flag of 0 is compressed data and a flag of 0xff is raw data. I don't know why one would encapsulate raw data between these headers, but that is what it looks like. All the files archived by the configuration validation test are done this way, but it may be an exception. Any occurrances of a nxt_loc.flag = 0 mean the data is compressed as discussed below. The algorithm I use to extract this data is a little messy as I find that one needs to have read in all the nxt_loc.len bytes before they can be decompressed. This requires that one may have to switch to another data block in the middle of the read operation.
The catalog file ends with a fixed length 0x100 byte header. I validate the catalog files by seeking to the end of the file, backing up 0x100 bytes and reading in the header. I save the position of this header to terminate parsing of the catalog file. Unlike the directory region in a data file, there appears to be no termination flag in a catalog file. At offset 0x60 in the header you should see the descriptive string: "Version 6.00.00 02/26/93 06:00 am". This is followed by the name of the catalog file and its timestamp as a string. I have not investigated the other fields as I can parse this file well enough with the information I have.
Both ver 1E and 2A backups include a copy of the catalog file. Its archived under a directory entry at the end of the listing for the directory in which MSBackUp is installed. It can be extracted using my nortbk.exe or MSBackUp. Its not really necessary if you are doing your extraction with nortbk.exe, but of interest.
Ultimately I found a linux 2.0 kernel module, dmsdos, was written to access compressed MSDOS drives. The source code is available, and the documentation has a nice summary of the compression algorithms the author has seen. This driver source is GPL code so presumably I can lift some of it without objection. It turned out that it was pretty easy to modify the dblspace_dec.c module so I could call it to decompress Ver 1E and 2A MSBackUp files. My samples all use the 'drive space' algorithm and the compressed segments begin with the "JM" version 0.0 sequence, but I suspect the 'double space' routine (which is also included) will work as well. If file contains compressed data I extract each complete data segment to a buffer, then call the decompression routine and write out the result. I really only have a general idea of what its doing, but I've extracted both text files and executables successfully.
My thanks go out to the various authors of this great work.
I only have a few comments of my own about decompression that aren't directly covered by the authors documentation. The linux kernel module above was designed to decompress disk sectors, ie it expects files to be in 0x200 byte blocks. I added a little code to allow variable file lengths. The source code assumes the caller knows how many bytes are to be expanded from the compressed code. I do not see any fields in the file headers that provide the number of bytes to be decompressed, only the number of compressed bytes in the current block. By trial and error I determined that the program seems to use an 0x5FFD byte buffer size for decompression. This means that one cause use the output file length (which is readily available) and this buffer size to pass an output length to the decompression routine. I track the # of bytes still to be decompressed as I do the decompression. If the remaining length is > 0x5FFD one decompresses a full buffer's worth of data, otherwise one decompresses the remaining length bytes. This works well in the tests I have done, but depends on a buffer size I determined by expairmentation.
Another point of interest is that you will find not all files in a backup with data compression enabled are actually compressed. This information is in the files directory region and displayed by nortbk.exe. It appears that short files are not compressed. MSBackUp also seems to do some simple testing as it doesn't seem to compress files which have already been compressed. My test files contained several *.lzh compressed archive in the data area. MSBackUp does not attempt to re-compress these files. I didn't have any *.zip or *.gz compressed file but expect the same result. Text and executable files are normally compressed unless they are very short.