Module notzed.dez
Package au.notzed.dez

Class ByteMatcherHash

  • All Implemented Interfaces:
    ByteMatcher

    public class ByteMatcherHash
    extends java.lang.Object
    implements ByteMatcher
    Finds common strings of bytes between a source and target buffer.

    This is basically an implementation of Bentley & McIllroy's paper ``Data Compression Using Long Common Strings'' applied instead to producing deltas and using a cyclic hash as the fingerprint function.

    Two other modifications are that back-tracking is not implemented but instead overlapping blocks can be used by setting the step size. And a further refinement is the detection of runs of the same byte which might otherwise pollute the hash tree for certain data.

    • Constructor Summary

      Constructors 
      Constructor Description
      ByteMatcherHash​(int b, int shortest, byte[] source, int sstep, byte[] target)
      Creates and initialises a new byte matcher.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      byte[] getBlockArray​(int offset)
      Retrieves the array containing the current match.
      int getBlockOffset​(int offset)
      Calculates the offset for the block array.
      int getLength()
      Retrieves the current length.
      int getMatchOffset()
      Retrieves the best match location.
      byte getRunByte()
      Retrieves the byte to be run-length encoded.
      byte[] getSource()
      Retrieve the source buffer.
      byte[] getTarget()
      Retrieve the target buffer.
      int getTargetOffset()
      Retrieves the current target position.
      int nextMatch()
      Finds the next match or run.
      void setChainLimit​(int value)
      Sets the hash bucket chain limit.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • ByteMatcherHash

        public ByteMatcherHash​(int b,
                               int shortest,
                               byte[] source,
                               int sstep,
                               byte[] target)
        Creates and initialises a new byte matcher.

        This is a single-use object.

        A step size of 1 produces the best output but requires the most memory and run time.

        Parameters:
        b - Sets block size, which is the number of bytes hashed per key (>=3).
        shortest - shortest string considered for a copy. Typically 4 bytes but dependent on the encoder used and the value of b.
        source - Source array.
        sstep - Sets the step size which is the interval of sampling of the source.
        target - Target array.
    • Method Detail

      • setChainLimit

        public void setChainLimit​(int value)
        Sets the hash bucket chain limit.

        Limiting this value will reduce the search for common prefixes which will in turn limit the search accuracy. A runtime/quality trade-off.

        By default there is no limit.

        Parameters:
        value - A value between 2 and 30 which sets the length limit to (1<<value)-1.
      • getSource

        public byte[] getSource()
        Description copied from interface: ByteMatcher
        Retrieve the source buffer.

        This must return all of the source data.

        Specified by:
        getSource in interface ByteMatcher
        Returns:
        The source bytes.
      • getTarget

        public byte[] getTarget()
        Description copied from interface: ByteMatcher
        Retrieve the target buffer.

        This must return all of the target data.

        Specified by:
        getTarget in interface ByteMatcher
        Returns:
        The target bytes.
      • getTargetOffset

        public int getTargetOffset()
        Description copied from interface: ByteMatcher
        Retrieves the current target position.

        The position within the target to which the current match refers.

        Specified by:
        getTargetOffset in interface ByteMatcher
        Returns:
        The position in target.
      • getLength

        public int getLength()
        Description copied from interface: ByteMatcher
        Retrieves the current length.

        This is the number of bytes to copy for the COPY state or repeat for the RUN state.

        Specified by:
        getLength in interface ByteMatcher
        Returns:
        The length of the current state.
      • getRunByte

        public byte getRunByte()
        Description copied from interface: ByteMatcher
        Retrieves the byte to be run-length encoded.

        If the current state is RUN then this returns the corresponding byte to run.

        Specified by:
        getRunByte in interface ByteMatcher
        Returns:
        The current run byte.
      • nextMatch

        public int nextMatch()
        Description copied from interface: ByteMatcher
        Finds the next match or run.

        Note that only matches or byte runs will be indicated. The location of non-matching data (i.e. append sequences) must be determined from the difference between the last targetOffset, the last length, and the current targetOffset.

        Specified by:
        nextMatch in interface ByteMatcher
        Returns:
        the new state.
      • getBlockOffset

        public int getBlockOffset​(int offset)
        Description copied from interface: ByteMatcher
        Calculates the offset for the block array.

        Maps the match offset to the array from getBlockArray.

        Specified by:
        getBlockOffset in interface ByteMatcher
        Parameters:
        offset - A logical offset into the combined source:target buffer.
        Returns:
        The offset into the array containing this position.
        See Also:
        ByteMatcher.getBlockArray(int), ByteMatcher.getMatchOffset()