Given two sequences x and y, calculate the Levenshtein distance of x and y, i. e. the minimum number of edits needed to transform x into y. The allowed edits are:

- insert a single item
- delete a single item
- replace a single item with another item

I did 3 implementations for this one. They all passed the tests but only the last one did pass without timeout on 4clojure.com.

## first implementation

I liked this one. I reversed the input and use the destructuring of clojure with the head and tail of the list.

(defn lv [a b] (letfn [(L [[f & r :as x] [h & t :as y]] (let [l (count x) m (count y)] (cond (= 0 l) m (= 0 m) l (= f h) (L r t) :else (+ 1 (min (L x t) (L r y) (L r t))))))] (L (-> a vec rseq) (-> b vec rseq))))

Unfortunately this timeout.

## Second Implementation

I initially thought of the timeout of the previous implementation because of the count call. So I tried to act on this but failure, tests ok, timeout on 4clojure.com.

(defn lv [a b] (letfn [(L [[f & r :as x] l [h & t :as y] m] (let [l- (- l 1) m- (- m 1)] (cond (= 0 l) m (= 0 m) l (= f h) (L r l- t m-) :else (+ 1 (min (L x l t m-) (L r l- y m) (L r l- t m-))))))] (L (-> a reverse vec) (count a) (-> b reverse vec) (count b))))

## Last

Finally, I worked directly with the string as vector and worked with indexes and length.

(defn lv [a b] (let [x (vec a) y (vec b)] (letfn [(L [l m] (let [l- (- l 1) m- (- m 1)] (cond (= 0 l) m (= 0 m) l (= (x l-) (y m-)) (L l- m-) :else (+ 1 (min (L l m-) (L l- m) (L l- m-))))))] (L (count x) (count y))))) (fact (lv "kitten" "sitting") => 3 (lv "closure" "clojure") => 1 (lv "clojure" "closure") => 1 (lv "xyx" "xyyyx") => 2 (lv "" "123456") => 6 (lv "Clojure" "Clojure") => 0 (lv "" "") => 0 (lv [] [] ) => 0 (lv [1 2 3 4] [0 2 3 4 5]) => 2 (lv '(:a :b :c :d) '(:a :d)) => 2 (lv "ttttattttctg" "tcaaccctaccat") => 10 (lv "gaattctaatctc" "caaacaaaaaattt") => 9)

source: 101

For the record, a link to the version I wrote 2 years ago:

https://gist.github.com/828413

Doesn’t use indexes, doesn’t (I think) timeout or blow the stack.

Cheers

This is an implementation of the levenshtein allison algorithm as defined here: http://www.csse.monash.edu.au/~lloyd/tildeFP/Haskell/1998/Edit01/

https://gist.github.com/cs224/889354/

Thanks Laurent btw :D

Thanks for the link, I’ll check when I’ll have time.

is there a way to make levenshtein faster? or perhaps is there a faster alternative algorithm?

Your solution could still blow up for larger strings, since it builds up the recursion tree. You should also try a memoized version similar to this:

“`

(defn edit-distance-mem [a b]

(let [f (fn [fm s1 s2]

(let [cost? (fn [p q]

(if (= p q) 0 1))]

(cond

(and (= 0 (count s1)) (= 0 (count s2))) 0

(zero? (count s1)) (count s2)

(zero? (count s2)) (count s1)

:else (min (inc (fm fm s1 (rest s2)))

(inc (fm fm (rest s1) s2))

(+ (cost? (first s1) (first s2))

(fm fm (rest s1) (rest s2)))

))))]

(f (memoize f) a b)))

“`

Hi, Thanks for your solution. Cheers