Class DiffUtil

java.lang.Object
org.eclipse.emf.compare.internal.utils.DiffUtil

public final class DiffUtil extends Object
This utility class will be used to provide similarity implementations.
  • Method Details

    • getAllRefiningDiffs

      public static Set<Diff> getAllRefiningDiffs(Diff diff)
      The set of all the diffs (atomic or not) that refine the given diff or one of its refining diffs, recursively.
      Parameters:
      diff - The diff for which all the refining diffs are seeked
      Returns:
      A set of all the refining diffs (as opposed to getting only the 1st level of refining diffs that would be obtained by calling getRefinedBy() on diff.
    • getRootRefinedDiffs

      public static List<Diff> getRootRefinedDiffs(Diff diff)
      Determines the root refined diff of a refining diff, i.e. a refined diff that is not refining another diff.
      Parameters:
      diff - The diff for which the root refined diffs are to be determined
      Returns:
      The root refined diffs of the provided (refining) diff, as a list. Never null. Empty if the provided diff does not refine any diff.
    • getAllAtomicRefiningDiffs

      public static Set<Diff> getAllAtomicRefiningDiffs(Diff diff)
      The set of all the atomic diffs that refine the given diff or one of its refining diffs, recursively. An atomic diff is a diff that is not refined by any diff.
      Parameters:
      diff - The diff for which all the refining diffs are seeked
      Returns:
      A set of all the atomic refining diffs (as opposed to getting only the 1st level of refining diffs that would be obtained by calling getRefinedBy() on diff.
    • diceCoefficient

      public static double diceCoefficient(String first, String second)
      Computes the dice coefficient between the two given String's bigrams.

      This implementation is case sensitive.

      Note that this implementation handles two- (or less) character-long strings specifically, degrading into a "char-by-char" comparison instead of using the bigram unit of the dice coefficient. We want the similarity between "v1" and "v2" to be 0.5 and not 0. However, we also want "v1" and "v2" to be "more similar" to each other than "v" and "v2" and "v1" and "v11" to be "more similar" than "v" and "v11" while this latter also needs to be "less similar" than "v1" and "v2". This requires a slightly different handling for comparisons with a "single character"-long string than for "two character"-long ones. A set of invariants we wish to meet can be found in the unit tests.

      Parameters:
      first - First of the two Strings to compare.
      second - Second of the two Strings to compare.
      Returns:
      The dice coefficient of the two given String's bigrams, ranging from 0d to 1d.
    • longestCommonSubsequence

      public static <E> List<E> longestCommonSubsequence(Comparison comparison, Iterable<E> ignoredElements, List<E> sequence1, List<E> sequence2)
      This will compute the longest common subsequence between the two given Lists, ignoring any object that is included in ignoredElements. We will use IEqualityHelper.matchingValues(Object, Object) in order to try and match the values from both lists two-by-two. This can thus be used both for reference values or attribute values. If there are two subsequences of the same "longest" length, the first (according to the second argument) will be returned.

      Take note that this might be slower than longestCommonSubsequence(Comparison, List, List) and should only be used when elements should be removed from the potential LCS. This is mainly aimed at merge operations during three-way comparisons as some objects might be in conflict and thus shifting the computed insertion indices.

      Please see longestCommonSubsequence(Comparison, List, List) for a more complete description.

      Type Parameters:
      E - Type of the sequences content.
      Parameters:
      comparison - This will be used in order to retrieve the Match for EObjects when comparing them.
      ignoredElements - Specifies elements that should be excluded from the subsequences.
      sequence1 - First of the two sequences to consider.
      sequence2 - Second of the two sequences to consider.
      Returns:
      The LCS of the two given sequences. Will never be the same instance as one of the input sequences.
    • longestCommonSubsequence

      public static <E> List<E> longestCommonSubsequence(Comparison comparison, List<E> sequence1, List<E> sequence2)
      This will compute the longest common subsequence between the two given Lists. We will use IEqualityHelper.matchingValues(Object, Object) in order to try and match the values from both lists two-by-two. This can thus be used both for reference values or attribute values. If there are two subsequences of the same "longest" length, the first (according to the second argument) will be returned.

      For example, it the two given sequence are, in this order, {"a", "b", "c", "d", "e"} and {"c", "z", "d", "a", "b"}, there are two "longest" subsequences : {"a", "b"} and {"c", "d"}. The first of those two subsequences in the second list is {"c", "d"}. On the other hand, the LCS of {"a", "b", "c", "d", "e"} and {"y", "c", "d", "e", "b"} is {"c", "d", "e"}.

      The following algorithm has been inferred from the wikipedia article on the Longest Common Subsequence, http://en.wikipedia.org/wiki/Longest_common_subsequence_problem at the time of writing. It is decomposed in two : we first compute the LCS matrix, then we backtrack through the input to determine the LCS. Evaluation will be shortcut after the first part if the LCS is one of the two input sequences.

      Note : we are not using Iterables as input in order to make use of the random access cost of ArrayLists. This might also be converted to directly use arrays. This implementation will not play well with LinkedLists or any List which needs to iterate over the values for each call to List.get(int), i.e any list which is not instanceof RandomAccess or does not satisfy its contract.

      Type Parameters:
      E - Type of the sequences content.
      Parameters:
      comparison - This will be used in order to retrieve the Match for EObjects when comparing them.
      sequence1 - First of the two sequences to consider.
      sequence2 - Second of the two sequences to consider.
      Returns:
      The LCS of the two given sequences. Will never be the same instance as one of the input sequences.
    • findInsertionIndex

      public static <E> int findInsertionIndex(Comparison comparison, Iterable<E> ignoredElements, List<E> source, List<E> target, E newElement)
      This will try and determine the index at which a given element from the source list should be inserted in the target list. We expect newElement to be an element from the source or to have a Match that allows us to map it to one of the source list's elements.

      The expected insertion index will always be relative to the Longest Common Subsequence (LCS) between the two given lists, ignoring all elements from that LCS that have changed between the target list and the common origin of the two. If there are more than one "longest" subsequence between the two lists, the insertion index will be relative to the first that comes in the target list.

      Note : we are not using Iterables as input in order to make use of the random access cost of ArrayLists. This might also be converted to directly use arrays. This implementation will not play well with LinkedLists or any List which needs to iterate over the values for each call to List.get(int), i.e any list which is not instanceof RandomAccess or does not satisfy its contract.

      Type Parameters:
      E - Type of the sequences content.
      Parameters:
      comparison - This will be used in order to retrieve the Match for EObjects when comparing them.
      ignoredElements - If there are elements from target that should be ignored when searching for an insertion index, set them here. Can be null or an empty list.
      source - The List from which one element has to be added to the target list.
      target - The List into which one element from source has to be added.
      newElement - The element from source that needs to be added into target.
      Returns:
      The index at which newElement should be inserted in target.
      See Also:
      Restriction:
      This method is not intended to be referenced by clients.
    • findInsertionIndexForElementAt

      public static <E> int findInsertionIndexForElementAt(Comparison comparison, List<E> source, List<E> target, List<E> lcs, int currentIndexInSource)
      This will try and determine the index at which a given element from the source list should be inserted in the target list.

      The expected insertion index will always be relative to the Longest Common Subsequence (LCS) between the two given lists.

      Type Parameters:
      E - Type of the sequences content.
      Parameters:
      comparison - This will be used in order to retrieve the Match for EObjects when comparing them.
      source - The List from which one element has to be added to the target list.
      target - The List into which one element from source has to be added.
      lcs - The precomputed LCS between these two lists.
      currentIndexInSource - The current index (in source) of the element we want to insert in target.
      Returns:
      The index at which newElement should be inserted in target.
      Restriction:
      This method is not intended to be referenced by clients.
    • findInsertionIndex

      public static <E> int findInsertionIndex(Comparison comparison, List<E> source, List<E> target, E newElement)
      This will try and determine the index at which a given element from the source list should be inserted in the target list. We expect newElement to be an element from the source or to have a Match that allows us to map it to one of the source list's elements.

      The expected insertion index will always be relative to the Longest Common Subsequence (LCS) between the two given lists. If there are more than one "longest" subsequence between the two lists, the insertion index will be relative to the first that comes in the target list.

      For example, assume source is {"1", "2", "4", "6", "8", "3", "0", "7", "5"} and target is {"8", "1", "2", "9", "3", "4", "7"}; I try to merge the addition of "0" in the right list. The returned "insertion index" will be 5 : just after "3". There are two subsequence of the same "longest" length 4 : {"1", "2", "3", "7"} and {"1", "2", "4", "7"}. However, the first of those two in target is {"1", "2", "3", "7"}. The closest element before "0" in this LCS in source is "3".

      Note : we are not using Iterables as input in order to make use of the random access cost of ArrayLists. This might also be converted to directly use arrays. This implementation will not play well with LinkedLists or any List which needs to iterate over the values for each call to List.get(int), i.e any list which is not instanceof RandomAccess or does not satisfy its contract.

      Type Parameters:
      E - Type of the sequences content.
      Parameters:
      comparison - This will be used in order to retrieve the Match for EObjects when comparing them.
      source - The List from which one element has to be added to the target list.
      target - The List into which one element from source has to be added.
      newElement - The element from source that needs to be added into target.
      Returns:
      The index at which newElement should be inserted in target.
      See Also:
    • findInsertionIndex

      public static int findInsertionIndex(Comparison comparison, Diff diff, boolean rightToLeft)
      This is the main entry point for findInsertionIndex(Comparison, Iterable, List, List, Object). It will use default algorithms to determine the source and target lists as well as the list of elements that should be ignored when computing the insertion index.
      Parameters:
      comparison - This will be used in order to retrieve the Match for EObjects when comparing them.
      diff - The diff which merging will trigger the need for an insertion index in its target list.
      rightToLeft - true if the merging will be done into the left list, so that we should consider the right model as the source and the left as the target.
      Returns:
      The index at which this diff's value should be inserted into the 'target' list, as inferred from rightToLeft.
      See Also:
    • isContainmentReference

      public static boolean isContainmentReference(EStructuralFeature feature)
      Specifies whether the given feature is a containment reference.
      Parameters:
      feature - The feature to check.
      Returns:
      true if the feature is a containment reference, false otherwise.
    • computeIgnoredElements

      public static <E> Set<E> computeIgnoredElements(Comparison comparison, IEqualityHelper equalityHelper, List<E> candidates, Diff diff, boolean rightToLeft)
      When computing the insertion index of an element in a list, we need to ignore all elements present in that list that feature unresolved Diffs on the same feature. This algorithm is expensive and because this method is generally called O{n^2} times we must take steps to optimize it. The caller assumes that a non-empty returned set is modifiable.
      Type Parameters:
      E - Type of the list's content.
      Parameters:
      comparison - This will be used in order to retrieve the Match for EObjects when comparing them.
      equalityHelper - This will be used to check for equality of values.
      candidates - The list in which we need to compute an insertion index.
      diff - The diff we are computing an insertion index for.
      rightToLeft - Direction of the merging. true if the merge is to be done on the left side, making 'target' the right side, false otherwise.
      Returns:
      The list of elements that should be ignored when computing the insertion index for a new element in candidates.