- java.lang.Object
-
- au.notzed.dez.ByteMatcherHash
-
- All Implemented Interfaces:
ByteMatcher
public class ByteMatcherHash extends java.lang.Object implements ByteMatcher
Finds common strings of bytes between a source and target buffer.This is basically an implementation of Bentley & McIllroy's paper ``Data Compression Using Long Common Strings'' applied instead to producing deltas and using a cyclic hash as the fingerprint function.
Two other modifications are that back-tracking is not implemented but instead overlapping blocks can be used by setting the step size. And a further refinement is the detection of runs of the same byte which might otherwise pollute the hash tree for certain data.
-
-
Field Summary
-
Fields inherited from interface au.notzed.dez.ByteMatcher
COPY, EOF, RUN
-
-
Constructor Summary
Constructors Constructor Description ByteMatcherHash(int b, int shortest, byte[] source, int sstep, byte[] target)
Creates and initialises a new byte matcher.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description byte[]
getBlockArray(int offset)
Retrieves the array containing the current match.int
getBlockOffset(int offset)
Calculates the offset for the block array.int
getLength()
Retrieves the current length.int
getMatchOffset()
Retrieves the best match location.byte
getRunByte()
Retrieves the byte to be run-length encoded.byte[]
getSource()
Retrieve the source buffer.byte[]
getTarget()
Retrieve the target buffer.int
getTargetOffset()
Retrieves the current target position.int
nextMatch()
Finds the next match or run.void
setChainLimit(int value)
Sets the hash bucket chain limit.
-
-
-
Constructor Detail
-
ByteMatcherHash
public ByteMatcherHash(int b, int shortest, byte[] source, int sstep, byte[] target)
Creates and initialises a new byte matcher.This is a single-use object.
A step size of 1 produces the best output but requires the most memory and run time.
- Parameters:
b
- Sets block size, which is the number of bytes hashed per key (>=3).shortest
- shortest string considered for a copy. Typically 4 bytes but dependent on the encoder used and the value of b.source
- Source array.sstep
- Sets the step size which is the interval of sampling of the source.target
- Target array.
-
-
Method Detail
-
setChainLimit
public void setChainLimit(int value)
Sets the hash bucket chain limit.Limiting this value will reduce the search for common prefixes which will in turn limit the search accuracy. A runtime/quality trade-off.
By default there is no limit.
- Parameters:
value
- A value between 2 and 30 which sets the length limit to (1<<value)-1.
-
getSource
public byte[] getSource()
Description copied from interface:ByteMatcher
Retrieve the source buffer.This must return all of the source data.
- Specified by:
getSource
in interfaceByteMatcher
- Returns:
- The source bytes.
-
getTarget
public byte[] getTarget()
Description copied from interface:ByteMatcher
Retrieve the target buffer.This must return all of the target data.
- Specified by:
getTarget
in interfaceByteMatcher
- Returns:
- The target bytes.
-
getMatchOffset
public int getMatchOffset()
Description copied from interface:ByteMatcher
Retrieves the best match location.If the current state is COPY then this returns a valid location of the best match. This should be interpreted using
ByteMatcher.getBlockArray(int)
andByteMatcher.getBlockOffset(int)
.- Specified by:
getMatchOffset
in interfaceByteMatcher
- Returns:
- The logical position within the combined source:target buffer.
-
getTargetOffset
public int getTargetOffset()
Description copied from interface:ByteMatcher
Retrieves the current target position.The position within the target to which the current match refers.
- Specified by:
getTargetOffset
in interfaceByteMatcher
- Returns:
- The position in target.
-
getLength
public int getLength()
Description copied from interface:ByteMatcher
Retrieves the current length.This is the number of bytes to copy for the COPY state or repeat for the RUN state.
- Specified by:
getLength
in interfaceByteMatcher
- Returns:
- The length of the current state.
-
getRunByte
public byte getRunByte()
Description copied from interface:ByteMatcher
Retrieves the byte to be run-length encoded.If the current state is RUN then this returns the corresponding byte to run.
- Specified by:
getRunByte
in interfaceByteMatcher
- Returns:
- The current run byte.
-
nextMatch
public int nextMatch()
Description copied from interface:ByteMatcher
Finds the next match or run.Note that only matches or byte runs will be indicated. The location of non-matching data (i.e. append sequences) must be determined from the difference between the last targetOffset, the last length, and the current targetOffset.
- Specified by:
nextMatch
in interfaceByteMatcher
- Returns:
- the new state.
-
getBlockArray
public byte[] getBlockArray(int offset)
Description copied from interface:ByteMatcher
Retrieves the array containing the current match.Maps the offset to the correct internal array.
- Specified by:
getBlockArray
in interfaceByteMatcher
- Parameters:
offset
- A logical offset into the combined source:target buffer.- Returns:
- The array containing this position.
- See Also:
ByteMatcher.getBlockOffset(int)
,ByteMatcher.getMatchOffset()
-
getBlockOffset
public int getBlockOffset(int offset)
Description copied from interface:ByteMatcher
Calculates the offset for the block array.Maps the match offset to the array from
getBlockArray
.- Specified by:
getBlockOffset
in interfaceByteMatcher
- Parameters:
offset
- A logical offset into the combined source:target buffer.- Returns:
- The offset into the array containing this position.
- See Also:
ByteMatcher.getBlockArray(int)
,ByteMatcher.getMatchOffset()
-
-