Documentation of the RAR format

2 General Format of a .RAR File

Overall .RAR file format:

signature               7 bytes    (0x52 0x61 0x72 0x21 0x1A 0x07 0x00)
[1st volume header]
...
[2nd volume header]
...
...
[nth volume header]
...

In general, a modern single-volume RAR file has a MAIN_HEAD structure followed by multiple FILE_HEAD structures.

2.1 Volume Header Format

The Base Header Block is:

header_crc              2 bytes
header_type             1 byte
header_flags            2 bytes
header_size             2 bytes

The header_size indicates how many total bytes the header requires. The header_type field determines how the remaining bytes should be interpreted.

2.1.1 Header Type

The header type is 8 bits (1 byte) and can have the following values:

Value	Type
0x72	MARK_HEAD
0x73	MAIN_HEAD
0x74	FILE_HEAD
0x75	COMM_HEAD
0x76	AV_HEAD
0x77	SUB_HEAD
0x78	PROTECT_HEAD
0x79	SIGN_HEAD
0x7a	NEWSUB_HEAD
0x7b	ENDARC_HEAD

2.1.1.2 MAIN_HEAD

The remaining bytes in the volume header for MAIN_HEAD are:

HighPosAv               2 bytes
PosAV                   4 bytes
EncryptVer              1 byte (only present if MHD_ENCRYPTVER is set)

2.1.1.3 FILE_HEAD

The remaining bytes in the FILE_HEAD structure are:

PackSize                4 bytes
UnpSize                 4 bytes
HostOS                  1 byte
FileCRC                 4 bytes
FileTime (mtime)        4 bytes (MS-DOS date/time format)
UnpVer                  1 byte
Method                  1 byte
NameSize                2 bytes
FileAttr                4 bytes
HighPackSize            4 bytes (only present if LHD_LARGE is set)
HighUnpSize             4 bytes (only present if LHD_LARGE is set)
FileName                (NameSize) bytes
Salt                    8 bytes (only present if LHD_SALT is set)
ExtTime Structure       See Description (only present if LHD_EXTTIME is set)
Packed Data             (Total Packed Size) bytes

If the LHD_LARGE flag is set, then the archive is large and 64-bits are needed to represent the packed and unpacked size. HighPackSize is used as the upper 32-bits and PackSize is used as the lower 32-bits for the packed size in bytes. HighUnpSize is used as the upper 32-bits and UnpSize is used as the lower 32-bits for the unpacked size in bytes.

ExtTime Structure

This structure has 4 sections representing additional time data.

The first 16 bits contain a set of flags, 4 bits for each section.

Each flag contains:

Bit 0: Whether this section is present or not
Bit 2: Signals that the MS-DOS date/time should have its seconds increased by 1

MS-DOS date/time format can only store even numbers of seconds. This bit counters this limitation.

Bit 3 and 4: The amount of bytes to be read as the remainder, between 0 and 3

Each section contains:

4 bytes containing an MS-DOS date/time (except for mtime, which is part of FILE_HEAD)
0 to 3 bytes containing fractions of seconds in 100ns increments

They form a 24-bit integer, sorted from least significant to most significant bits
If fewer than 3 bytes are present, the remaining least significant bits are to be padded with zeroes in order to complete 24-bits

Follow this pseudo-code to read in the ExtTime Structure.

const one_second = 10000000; // 1 second in 100ns intervals => 1E7
var extTimeFlags = readBits(16)

mtime:
mtime_rmode = extTimeFlags >> 12
if ((mtime_rmode & 8)==0) goto ctime
mtime_count = mtime_rmode & 0x3
mtime_remainder = 0
for (var i = 0; i < mtime_count; i++) {
    mtime_remainder = (readBits(8) << 16) | (mtime_remainder >> 8)
}
if ((mtime_rmode & 4)!=0) mtime_remainder += one_second

ctime:
ctime_rmode = extTimeFlags >> 8
if ((ctime_rmode & 8)==0) goto atime
ctime = readBits(32)
ctime_count = ctime_rmode & 0x3
ctime_remainder = 0
for (var i = 0; i < ctime_count; i++) {
    ctime_remainder = (readBits(8) << 16) | (ctime_remainder >> 8)
}
if ((ctime_rmode & 4)!=0) ctime_remainder += one_second

atime:
atime_rmode = extTimeFlags >> 4
if ((atime_rmode & 8)==0) goto arctime
atime = readBits(32)
atime_count = atime_rmode & 0x3
atime_remainder = 0
for (var i = 0; i < atime_count; i++) {
    atime_remainder = (readBits(8) << 16) | (atime_remainder >> 8)
}
if ((atime_rmode & 4)!=0) atime_remainder += one_second

arctime:
arctime_rmode = extTimeFlags
if ((arctime_rmode & 8)==0) goto done_exttime
arctime = readBits(32)
arctime_count = arctime_rmode & 0x3
arctime_remainder = 0
for (var i = 0; i < arctime_count; i++) {
    arctime_remainder = (readBits(8) << 16) | (arctime_remainder >> 8)
}
if ((arctime_rmode & 4)!=0) arctime_remainder += one_second

done_exttime

2.1.1.4 COMM_HEAD

TBD

2.1.1.5 AV_HEAD

TBD

2.1.1.6 SUB_HEAD

TBD

2.1.1.7 PROTECT_HEAD

TBD

2.1.1.8 SIGN_HEAD

TBD

2.1.1.9 NEWSUB_HEAD

TBD

2.1.1.10 ENDARC_HEAD

TBD

2.1.2 Header Flags

The header flags are 16 bits (2 bytes). Depending on the type of Volume Header, the flags are interpreted differently.

The Main Header Flags are:

Value	Flag
0x0001	MHD_VOLUME
0x0002	MHD_COMMENT
0x0004	MHD_LOCK
0x0008	MHD_SOLID
0x0010	MHD_PACK_COMMENT or MHD_NEWNUMBERING
0x0020	MHD_AV
0x0040	MHD_PROTECT
0x0080	MHD_PASSWORD
0x0100	MHD_FIRSTVOLUME
0x0200	MHD_ENCRYPTVER

The File Header Flags are:

Value	Flag
0x0001	LHD_SPLIT_BEFORE
0x0002	LHD_SPLIT_AFTER
0x0004	LHD_PASSWORD
0x0008	LHD_COMMENT
0x0010	LHD_SOLID
0x0100	LHD_LARGE
0x0200	LHD_UNICODE
0x0400	LHD_SALT
0x0800	LHD_VERSION
0x1000	LHD_EXTTIME
0x2000	LHD_EXTFLAGS

2.1.2.1 MHD_VOLUME

Value 0x0001. TBD

2.1.2.2 MHD_COMMENT

Value 0x0002. TBD

2.1.2.3 MHD_LOCK

Value 0x0004. TBD

2.1.2.4 MHD_SOLID

Value 0x0008. TBD

2.1.2.5 MHD_PACK_COMMENT

Value 0x0010. TBD

2.1.2.6 MHD_AV

Value 0x0020. TBD

2.1.2.7 MHD_PROTECT

Value 0x0040. TBD

2.1.2.8 MHD_PASSWORD

Value 0x0080. TBD

2.1.2.9 MHD_FIRSTVOLUME

Value 0x0100. TBD

2.1.2.10 MHD_ENCRYPTVER

Value 0x0200. Indicates whether encryption is present in the archive volume.

2.1.2.11 LHD_SPLIT_BEFORE

Value 0x0001. TBD

2.1.2.12 LHD_SPLIT_AFTER

Value 0x0002. TBD

2.1.2.13 LHD_PASSWORD

Value 0x0004. TBD

2.1.2.14 LHD_COMMENT

Value 0x0008. TBD

2.1.2.15 LHD_SOLID

Value 0x0010. TBD

2.1.2.16 LHD_LARGE

Value 0x0100. Indicates if the archive is large. In this case, 64 bits are used to describe the packed and unpacked size.

2.1.2.17 LHD_UNICODE

Value 0x0200. Indicates if the filename is Unicode.

2.1.2.18 LHD_SALT

Value 0x0400. Indicates if the 64-bit salt value is present.

2.1.2.19 LHD_VERSION

Value 0x0800. TBD

2.1.2.20 LHD_EXTTIME

Value 0x1000. The ExtTime Structure is present in the FILE_HEAD header.

2.1.2.21 LHD_EXTFLAGS

Value 0x2000. TBD

3 Unpacking

Once the header information and packed bytes have been extracted, the packed bytes must then be unpacked. RAR uses a variety of algorithms for this. Chief among these are Lempel-Ziv compression and Prediction by Partial Matching. The details of the unpacking are specified in the following subsections based on the values of UnpVer and Method as decoded in the FILE_HEAD structure.

If Method is 0x30 (decimal 48), then the packed bytes are the unpacked bytes and no decompression/unpacking is necessary (i.e. the file was not compressed).

Otherwise:

UnpVer Value (decimal)	Algorithm To Use
15	Unpack15
20	Unpack20
26	Unpack20
29	Unpack29
36	Unpack29

3.1 Unpack15

TBD

3.2 Unpack20

TBD

3.3 Unpack29

The structure of packed data consists of N number of blocks. If the first bit of a block is set, then process the block as a PPM block. Otherwise, this is an LZ block.

3.3.1 LZ Block

The format of a LZ block is:

isPPM                   1 bit
keepOldTable            1 bit
huffmanCodeTable        (variable size)

3.3.1.1 Huffman Code Tables

The Huffman Encoding tables consist of a series of bit lengths. For a more thorough treatment of the concepts of Huffman Encoding, see the Deflate spec. The RAR format uses a set of twenty bit lengths to construct Huffman Codes. The Huffman Encoding tables in RAR files consist of at most twenty entries of the format:

BitLength               4 bits
ZeroCount               4 bits (only present if BitLength is 15)

If BitLength is 15, then the next 4 bits are read as ZeroCount. If the ZeroCount is 0, then the bit length is 15, otherwise (ZeroCount+2) is the number of consecutive zero bit lengths that are in the table. For instance, if the following 4-bit numbers are present:

0x8	indicates a bit-length of 8
0x4	indicates a bit-length of 4
0x4	indicates a bit-length of 4
0x2	indicates a bit-length of 2
0xF	these two 4-bit numbers specify a bit-length of 15
0x0	these two 4-bit numbers specify a bit-length of 15
0xF	these two 4-bit numbers specify a run of 5 zeros
0x3	these two 4-bit numbers specify a run of 5 zeros
0x9	indicates a bit-length of 9
0x3	indicates a bit-length of 3
0xF	these two 4-bit numbers specify a run of 8 zeros
0x6	these two 4-bit numbers specify a run of 8 zeros

This example describes a Huffman Encoding Bit Length table of:

Code	Bit Length	Code	BitLength
1	8	11	9
2	4	12	3
3	4	13	0
4	2	14	0
5	15	15	0
6	0	16	0
7	0	17	0
8	0	18	0
9	0	19	0
10	0	20	0

Once the twenty bit lengths are obtained, the Huffman Encoding table is constructed by using the following algorithm:

1) Count the number of codes for each code length.  Let
   LenCount[N] be the number of codes of length N, where N = {1..16}.
             
2) Find the decode length and positions:

        N = 0
        TmpPos[0] = 0
        DecodePos[0] = 0
        DecodeLen[0] = 0
        for (I = 1; I < 16; I++) 
        {
            N = 2 * (N+LenCount[I])
            M = N << (15-I)
            if (M > 0xFFFF) M = 0xFFFF
            
            DecodeLen[I] = (unsigned int)M
            TmpPos[I] = DecodePos[I] = DecodePos[I-1] + LenCount[I-1]
        }

3) Assign numerical values to all codes:

        for (I = 0; I < TableSize; I++)
        {
            if (BitLength[I] != 0)
                DecodeNum[ TmpPos[BitLength[I] & 0xF]++ ] = I
        }