4clojure – Levenshtein distance problem

Share

Given two sequences x and y, calculate the Levenshtein distance of x and y, i. e. the minimum number of edits needed to transform x into y. The allowed edits are:

  • insert a single item
  • delete a single item
  • replace a single item with another item

I did 3 implementations for this one. They all passed the tests but only the last one did pass without timeout on 4clojure.com.

first implementation

I liked this one. I reversed the input and use the destructuring of clojure with the head and tail of the list.

(defn lv
    [a b]
    (letfn
        [(L [[f & r :as x] [h & t :as y]]
           (let [l (count x)
                 m (count y)]
             (cond (= 0 l) m
                   (= 0 m) l
                   (= f h) (L r t)
                   :else (+ 1 (min (L x t)
                                   (L r y)
                                   (L r t))))))]
      (L (-> a vec rseq) (-> b vec rseq))))

Unfortunately this timeout.

Second Implementation

I initially thought of the timeout of the previous implementation because of the count call. So I tried to act on this but failure, tests ok, timeout on 4clojure.com.

(defn lv
    [a b]
    (letfn
        [(L [[f & r :as x] l [h & t :as y] m]
           (let [l- (- l 1)
                 m- (- m 1)]
             (cond (= 0 l) m
                   (= 0 m) l
                   (= f h) (L r l- t m-)
                   :else (+ 1 (min (L x l t m-)
                                   (L r l- y m)
                                   (L r l- t m-))))))]
      (L (-> a reverse vec) (count a) (-> b reverse vec) (count b))))

Last

Finally, I worked directly with the string as vector and worked with indexes and length.

(defn lv
  [a b]
  (let [x (vec a)
        y (vec b)]
    (letfn
        [(L [l m]
           (let [l- (- l 1)
                 m- (- m 1)]
             (cond (= 0 l) m
                   (= 0 m) l
                   (= (x l-) (y m-)) (L l- m-)
                   :else (+ 1 (min
                                   (L l m-)
                                   (L l- m)
                                   (L l- m-))))))]
      (L (count x) (count y)))))

(fact
  (lv "kitten" "sitting")               => 3
  (lv "closure" "clojure")              => 1
  (lv "clojure" "closure")              => 1
  (lv "xyx" "xyyyx")                    => 2
  (lv "" "123456")                      => 6
  (lv "Clojure" "Clojure")              => 0
  (lv "" "")                            => 0
  (lv  [] [] )                          => 0
  (lv [1 2 3 4] [0 2 3 4 5])            => 2
  (lv '(:a :b :c :d) '(:a :d))          => 2
  (lv "ttttattttctg" "tcaaccctaccat")   => 10
  (lv "gaattctaatctc" "caaacaaaaaattt") => 9)

source: 101

  • Laurent Petit

    For the record, a link to the version I wrote 2 years ago:

    https://gist.github.com/828413

    Doesn’t use indexes, doesn’t (I think) timeout or blow the stack.

    Cheers

  • Christian Schuhegger

    This is an implementation of the levenshtein allison algorithm as defined here: http://www.csse.monash.edu.au/~lloyd/tildeFP/Haskell/1998/Edit01/

    https://gist.github.com/cs224/889354/

  • http://adumont.fr/blog tony

    Thanks Laurent btw :D

  • http://adumont.fr/blog tony

    Thanks for the link, I’ll check when I’ll have time.

  • http://javadeveloper.asia/grails-fuzzy-string-matching-with-mysql-and-levenshtein-distance/ Levenshtein

    is there a way to make levenshtein faster? or perhaps is there a faster alternative algorithm?

  • grdvnl

    Your solution could still blow up for larger strings, since it builds up the recursion tree. You should also try a memoized version similar to this:

    “`
    (defn edit-distance-mem [a b]
    (let [f (fn [fm s1 s2]
    (let [cost? (fn [p q]
    (if (= p q) 0 1))]
    (cond
    (and (= 0 (count s1)) (= 0 (count s2))) 0
    (zero? (count s1)) (count s2)
    (zero? (count s2)) (count s1)
    :else (min (inc (fm fm s1 (rest s2)))
    (inc (fm fm (rest s1) s2))
    (+ (cost? (first s1) (first s2))
    (fm fm (rest s1) (rest s2)))
    ))))]
    (f (memoize f) a b)))
    “`

  • http://adumont.fr/blog tony

    Hi, Thanks for your solution. Cheers