-
- All Known Implementing Classes:
DEZDecoder
,DEZEncoder
,DEZPrinter
public interface DEZFormat
Defines constants used in the DEZ1 format.Data Types
The patch format is a byte stream consisting of a header and a sequence of bytes, followed by a crc code.
Ciiiiiiii integer A multi-byte integer. C is a continue bit. The bits are in big-endian order. aaaaaaaaa address An encoded address. See below. ddddddddd data A data byte. XXXX 4-bytes 4 sequential bytes.
A trailing (*) indicates zero or more instances. A trailing (+) indicates a self-describing encoding of one or more bytes. A trailing (?) indicates an implied encoding of one or more bytes.Patch Format
HEADER: XXXX magic 4 bytes ASCII, 'DEZ1'. Ciiiiiii+ smallest Smallest copy offset. Ciiiiiii+ split Split point for single operation opcode. Ciiiiiii+ source Source data size. Ciiiiiii+ target Target data size. DATA: IIIIIIII* commands Zero or more instruction opcodes and operands CRC: XXXX crc 4-byte CRC of target bytes, network order.
Instruction stream
Instructions can perform either a single operation or two operations with limited ranges.Single commands perform one operation. They are a 7-bit value with the 8th bit set. A range test defines the operation.
1nnnnnnn: assuming a default split=100 [000 099] aaaaaaaa+ COPY Copy (n+smallest) from the address. smallest .. smallest+99 [100 123] dddddddd? ADD Append (n-99) from the patch. 1 .. 24 [ 124] Ciiiiiii+ aaaaaaaa+ COPY Copy (i+100+smallest) from the address. [ 125] Ciiiiiii+ dddddddd? ADD Append (i+(124-100)+1) from the patch. [ 126] Ciiiiiii+ dddddddd RUN Append (i+3) copies of the data byte. [ 127] Reserved.
The split point between an immediate-length COPY and ADD is stored in the stream. The defaultOP_SINGLE_SPLIT
is 100.Dual command perform two operations with a single opcode. The lengths are encoded as immediate values and have a limited range of 3 bits each.
00lllccc dddddddd? aaaaaaaa+ ADD+COPY Append the next (lll+1) data bytes (dddddddd?) from the patch. Then copy (ccc+smallest) bytes from the address (aaaaaaaa+). 01cccCCC aaaaaaaa+ AAAAAAAA+ COPY+COPY Copy (ccc+smallest) bytes from the first address (aaaaaaaa+). Then copy (CCC+smallest) bytes from the second address (AAAAAAAA+).
Addresses
Addresses are a logical pointer which refers to a location within a buffer consisting of the concatenated source and target buffers. The encoder only referneces addresses that have already been seen so they will also be visible to the decoder.A small address cache is maintained in both the encoder and decoder to the last N recently accessed addresses. This can be used to encode the address with fewer bytes.
Addresses can be encoded in three ways.
- An exact reference to an address cache entry.
- A reference to an address cache entry with a signed offset.
- An absolute address.
00nnnnnn - use address 'n' exactly. 01Snnnnn Ciiiiiii+ - use address 'n' plus (S=0) or minus (S=1) 'i' 1iiiiiii Ciiiiiii+ - use address 'i' absolute. Always at least 2 bytes.
The address table is 64 elements long but the relative instruction can only reference the most recent 32 elements. It is updated each time an address is encoded or decoded in a simple rolling manner.
Conceptually it can be broken into two separate parts:
matchAddr[(matchNext++) & 63] = addr; recentAddr[(recentNext++) & 31] = addr;
First the encoder searches the match table; if it finds a match the address is encoded implicitly in the opcode via an index.Next the encoder finds the nearest match from the recent table as absolute distance. The encoder will then encode the address as a relative offset from a member of the table if the total encoded length is shorter than encoding the address absolutely.
-
-
Field Summary
Fields Modifier and Type Field Description static int
ADDR_ABSOLUTE
Indicates whether this is an absolute or indirect address.static int
ADDR_RELATIVE
Indicates whether the address is relative or exact indirect.static int
ADDR_SIGN
The sign of the offset for a relative indirect address.static byte[]
MAGIC
File magic 'DEZ1'.static int
OP_ADD
Extended ADD opcode.static int
OP_COPY
Extended COPY opcode.static int
OP_DUAL_COPY
COPY select bit in dual opcode.static int
OP_EXT
Extended command.static int
OP_RUN
RUN opcode.static int
OP_SINGLE
Single-operation opcode selector bit.static int
OP_SINGLE_LIMIT
The limit of immediate length single operations.static int
OP_SINGLE_SPLIT
The default split-point between copy and add for immediate length single operations.
-
Method Summary
Static Methods Modifier and Type Method Description static byte[]
decode(byte[] source, byte[] patch)
Apply a DEZ-1 patch to data.static byte[]
encode(byte[] source, byte[] target)
Creates a DEZ-1 format patch from data.static byte[]
encode(int blockSize, int stepSize, byte[] source, byte[] target)
Creates a DEZ-1 format patch from data.
-
-
-
Field Detail
-
MAGIC
static final byte[] MAGIC
File magic 'DEZ1'.
-
OP_DUAL_COPY
static final int OP_DUAL_COPY
COPY select bit in dual opcode.- See Also:
- Constant Field Values
-
OP_SINGLE
static final int OP_SINGLE
Single-operation opcode selector bit.- See Also:
- Constant Field Values
-
OP_SINGLE_SPLIT
static final int OP_SINGLE_SPLIT
The default split-point between copy and add for immediate length single operations.A code below this value is a COPY, otherwise it is an ADD.
- See Also:
- Constant Field Values
-
OP_SINGLE_LIMIT
static final int OP_SINGLE_LIMIT
The limit of immediate length single operations.This is the first non-immediate opcode (i.e. op_copy)
- See Also:
- Constant Field Values
-
OP_COPY
static final int OP_COPY
Extended COPY opcode.- See Also:
- Constant Field Values
-
OP_ADD
static final int OP_ADD
Extended ADD opcode.- See Also:
- Constant Field Values
-
OP_RUN
static final int OP_RUN
RUN opcode.- See Also:
- Constant Field Values
-
OP_EXT
static final int OP_EXT
Extended command. Reserved.- See Also:
- Constant Field Values
-
ADDR_ABSOLUTE
static final int ADDR_ABSOLUTE
Indicates whether this is an absolute or indirect address.If this bit it set this represents the first byte of a multi-byte encoded integer whose value is the absolute address reference.
If this bit is clear then the address is an indirect address.
- See Also:
- Constant Field Values
-
ADDR_RELATIVE
static final int ADDR_RELATIVE
Indicates whether the address is relative or exact indirect.If set then the lower 5 bits select the source address slot and the ADDR_SIGN together with the following integer indicate the offset.
If not set this is an exact indirect address and the lower 6 bits select the source address slot.
- See Also:
- Constant Field Values
-
ADDR_SIGN
static final int ADDR_SIGN
The sign of the offset for a relative indirect address.If set the offset should be negated.
- See Also:
- Constant Field Values
-
-
Method Detail
-
encode
static byte[] encode(byte[] source, byte[] target)
Creates a DEZ-1 format patch from data.This uses typical encoding parameters suitable for small textual blobs.
- Parameters:
source
- Source bytes.target
- Target bytes.- Returns:
- patch, the binary delta.
-
encode
static byte[] encode(int blockSize, int stepSize, byte[] source, byte[] target)
Creates a DEZ-1 format patch from data.This form allows tuning parameters.
- blockSize
- Recommended 6, typically 4-8. A smaller value will allow matching shorter runs but will increase hash collision rates and increase encoding times.
- stepSize
- Recommended 1 for text, typically 1-8. This has the largest impact on encoding time as it directly relates to the number of run-length tests. Smaller values will always produce more compact patches.
- Parameters:
blockSize
- The size of blocks considered for matching. Will be clamped to be ≥ 3.stepSize
- The sampling step size. Will be clamped to be ≥ 1.source
- Source bytes.target
- Target bytes.- Returns:
- patch the binary delta.
-
decode
static byte[] decode(byte[] source, byte[] patch) throws DEZFormatException
Apply a DEZ-1 patch to data.- Parameters:
source
- Source bytes.patch
- Patch bytes.- Returns:
- target, the reconstructed data.
- Throws:
DEZFormatException
- If the patch is not a compatible format, or the crc check failed.
-
-