This document, a work-in-progress, describes the RAR format. It serves a similar role that the ZIP App Note does for the ZIP format.
NOTE 1: This documentation MUST NOT be used to create RAR-compatible archive programs like WinRAR. It is only for the purposes of writing decompression software (i.e. unrar) in various languages. It was reverse-engineered from the UnRAR source located at this page with Eugene Roshal's permission.
NOTE 2: This documentation will initially focus on what I believe is Version 3 of the RAR format.
The author of this document is Jeff Schiller <codedread@gmail.com>. It is licensed under the CC-BY-3.0 License.
Overall .RAR file format:
signature 7 bytes (0x52 0x61 0x72 0x21 0x1A 0x07 0x00) [1st volume header] ... [2nd volume header] ... ... [nth volume header] ...
In general, a modern single-volume RAR file has a MAIN_HEAD structure followed by multiple FILE_HEAD structures.
The Base Header Block is:
header_crc 2 bytes header_type 1 byte header_flags 2 bytes header_size 2 bytes
The header_size indicates how many total bytes the header requires. The header_type field determines how the remaining bytes should be interpreted.
The header type is 8 bits (1 byte) and can have the following values:
Value | Type |
---|---|
0x72 | MARK_HEAD |
0x73 | MAIN_HEAD |
0x74 | FILE_HEAD |
0x75 | COMM_HEAD |
0x76 | AV_HEAD |
0x77 | SUB_HEAD |
0x78 | PROTECT_HEAD |
0x79 | SIGN_HEAD |
0x7a | NEWSUB_HEAD |
0x7b | ENDARC_HEAD |
TBD
The remaining bytes in the volume header for MAIN_HEAD are:
HighPosAv 2 bytes PosAV 4 bytes EncryptVer 1 byte (only present if MHD_ENCRYPTVER is set)
The remaining bytes in the FILE_HEAD structure are:
PackSize 4 bytes UnpSize 4 bytes HostOS 1 byte FileCRC 4 bytes FileTime (mtime) 4 bytes (MS-DOS date/time format) UnpVer 1 byte Method 1 byte NameSize 2 bytes FileAttr 4 bytes HighPackSize 4 bytes (only present if LHD_LARGE is set) HighUnpSize 4 bytes (only present if LHD_LARGE is set) FileName (NameSize) bytes Salt 8 bytes (only present if LHD_SALT is set) ExtTime Structure See Description (only present if LHD_EXTTIME is set) Packed Data (Total Packed Size) bytes
If the LHD_LARGE flag is set, then the archive is large and 64-bits are needed to represent the packed and unpacked size. HighPackSize is used as the upper 32-bits and PackSize is used as the lower 32-bits for the packed size in bytes. HighUnpSize is used as the upper 32-bits and UnpSize is used as the lower 32-bits for the unpacked size in bytes.
This structure has 4 sections representing additional time data.
The first 16 bits contain a set of flags, 4 bits for each section.
Each flag contains:
Each section contains:
Follow this pseudo-code to read in the ExtTime Structure.
const one_second = 10000000; // 1 second in 100ns intervals => 1E7 var extTimeFlags = readBits(16) mtime: mtime_rmode = extTimeFlags >> 12 if ((mtime_rmode & 8)==0) goto ctime mtime_count = mtime_rmode & 0x3 mtime_remainder = 0 for (var i = 0; i < mtime_count; i++) { mtime_remainder = (readBits(8) << 16) | (mtime_remainder >> 8) } if ((mtime_rmode & 4)!=0) mtime_remainder += one_second ctime: ctime_rmode = extTimeFlags >> 8 if ((ctime_rmode & 8)==0) goto atime ctime = readBits(32) ctime_count = ctime_rmode & 0x3 ctime_remainder = 0 for (var i = 0; i < ctime_count; i++) { ctime_remainder = (readBits(8) << 16) | (ctime_remainder >> 8) } if ((ctime_rmode & 4)!=0) ctime_remainder += one_second atime: atime_rmode = extTimeFlags >> 4 if ((atime_rmode & 8)==0) goto arctime atime = readBits(32) atime_count = atime_rmode & 0x3 atime_remainder = 0 for (var i = 0; i < atime_count; i++) { atime_remainder = (readBits(8) << 16) | (atime_remainder >> 8) } if ((atime_rmode & 4)!=0) atime_remainder += one_second arctime: arctime_rmode = extTimeFlags if ((arctime_rmode & 8)==0) goto done_exttime arctime = readBits(32) arctime_count = arctime_rmode & 0x3 arctime_remainder = 0 for (var i = 0; i < arctime_count; i++) { arctime_remainder = (readBits(8) << 16) | (arctime_remainder >> 8) } if ((arctime_rmode & 4)!=0) arctime_remainder += one_second done_exttime
TBD
TBD
TBD
TBD
TBD
TBD
TBD
The header flags are 16 bits (2 bytes). Depending on the type of Volume Header, the flags are interpreted differently.
The Main Header Flags are:
Value | Flag |
---|---|
0x0001 | MHD_VOLUME |
0x0002 | MHD_COMMENT |
0x0004 | MHD_LOCK |
0x0008 | MHD_SOLID |
0x0010 | MHD_PACK_COMMENT or MHD_NEWNUMBERING |
0x0020 | MHD_AV |
0x0040 | MHD_PROTECT |
0x0080 | MHD_PASSWORD |
0x0100 | MHD_FIRSTVOLUME |
0x0200 | MHD_ENCRYPTVER |
The File Header Flags are:
Value | Flag |
---|---|
0x0001 | LHD_SPLIT_BEFORE |
0x0002 | LHD_SPLIT_AFTER |
0x0004 | LHD_PASSWORD |
0x0008 | LHD_COMMENT |
0x0010 | LHD_SOLID |
0x0100 | LHD_LARGE |
0x0200 | LHD_UNICODE |
0x0400 | LHD_SALT |
0x0800 | LHD_VERSION |
0x1000 | LHD_EXTTIME |
0x2000 | LHD_EXTFLAGS |
Value 0x0001. TBD
Value 0x0002. TBD
Value 0x0004. TBD
Value 0x0008. TBD
Value 0x0010. TBD
Value 0x0020. TBD
Value 0x0040. TBD
Value 0x0080. TBD
Value 0x0100. TBD
Value 0x0200. Indicates whether encryption is present in the archive volume.
Value 0x0001. TBD
Value 0x0002. TBD
Value 0x0004. TBD
Value 0x0008. TBD
Value 0x0010. TBD
Value 0x0100. Indicates if the archive is large. In this case, 64 bits are used to describe the packed and unpacked size.
Value 0x0200. Indicates if the filename is Unicode.
Value 0x0400. Indicates if the 64-bit salt value is present.
Value 0x0800. TBD
Value 0x1000. The ExtTime Structure is present in the FILE_HEAD header.
Value 0x2000. TBD
Once the header information and packed bytes have been extracted, the packed bytes must then be unpacked. RAR uses a variety of algorithms for this. Chief among these are Lempel-Ziv compression and Prediction by Partial Matching. The details of the unpacking are specified in the following subsections based on the values of UnpVer and Method as decoded in the FILE_HEAD structure.
If Method is 0x30 (decimal 48), then the packed bytes are the unpacked bytes and no decompression/unpacking is necessary (i.e. the file was not compressed).
Otherwise:
UnpVer Value (decimal) | Algorithm To Use |
---|---|
15 | Unpack15 |
20 | Unpack20 |
26 | |
29 | Unpack29 |
36 |
TBD
TBD
The structure of packed data consists of N number of blocks. If the first bit of a block is set, then process the block as a PPM block. Otherwise, this is an LZ block.
The format of a LZ block is:
isPPM 1 bit keepOldTable 1 bit huffmanCodeTable (variable size)
The Huffman Encoding tables consist of a series of bit lengths. For a more thorough treatment of the concepts of Huffman Encoding, see the Deflate spec. The RAR format uses a set of twenty bit lengths to construct Huffman Codes. The Huffman Encoding tables in RAR files consist of at most twenty entries of the format:
BitLength 4 bits ZeroCount 4 bits (only present if BitLength is 15)
If BitLength is 15, then the next 4 bits are read as ZeroCount. If the ZeroCount is 0, then the bit length is 15, otherwise (ZeroCount+2) is the number of consecutive zero bit lengths that are in the table. For instance, if the following 4-bit numbers are present:
0x8 | indicates a bit-length of 8 | 0x4 | indicates a bit-length of 4 | 0x4 | indicates a bit-length of 4 | 0x2 | indicates a bit-length of 2 | 0xF | these two 4-bit numbers specify a bit-length of 15 | 0x0 | 0xF | these two 4-bit numbers specify a run of 5 zeros | 0x3 | 0x9 | indicates a bit-length of 9 | 0x3 | indicates a bit-length of 3 | 0xF | these two 4-bit numbers specify a run of 8 zeros | 0x6 |
This example describes a Huffman Encoding Bit Length table of:
Code | Bit Length | Code | BitLength |
---|---|---|---|
1 | 8 | 11 | 9 |
2 | 4 | 12 | 3 |
3 | 4 | 13 | 0 |
4 | 2 | 14 | 0 |
5 | 15 | 15 | 0 |
6 | 0 | 16 | 0 |
7 | 0 | 17 | 0 |
8 | 0 | 18 | 0 |
9 | 0 | 19 | 0 |
10 | 0 | 20 | 0 |
Once the twenty bit lengths are obtained, the Huffman Encoding table is constructed by using the following algorithm:
1) Count the number of codes for each code length. Let LenCount[N] be the number of codes of length N, where N = {1..16}. 2) Find the decode length and positions: N = 0 TmpPos[0] = 0 DecodePos[0] = 0 DecodeLen[0] = 0 for (I = 1; I < 16; I++) { N = 2 * (N+LenCount[I]) M = N << (15-I) if (M > 0xFFFF) M = 0xFFFF DecodeLen[I] = (unsigned int)M TmpPos[I] = DecodePos[I] = DecodePos[I-1] + LenCount[I-1] } 3) Assign numerical values to all codes: for (I = 0; I < TableSize; I++) { if (BitLength[I] != 0) DecodeNum[ TmpPos[BitLength[I] & 0xF]++ ] = I }