levenshtein to slow, how to speed it up

nlogn 2010. 9. 20. 17:28

2010. 9. 20. 17:28

[출처 : http://kerbtier.ch/2008/12/30/levenshtein-to-slow-how-to-speed-it-up]

you don’t need 40′000 levenshtein calls

one attempt to make it faster is to rule out some strings by comparing lengths first. if you say your maximum distance for two strings to match is 10 then you can discard strings which difference in character count is bigger than 10. it’s already much faster, for the 40′000 and for my usage almost fast enough but the faster the better. in this case i still have to check one strings length against 40′000 other strings length. this can be speed up by putting the sets of the strings with the same length into an array where the index is the length of the sets strings. so i don’t have to compare it against all strings. if l is the length of the string i want to test then i need to test it only against the strings in the sets for the indexes with a length from l-10 to l+10. probably i can rule out with this technique even more strings, by example count the number of words in the strings instead of the number of characters or the number of vowels. these approaches could be combined together and it probably will speed it up another bit. but due to statistics the result would probably be about the same like in the case where i use the total length of the string.

저작자표시 (새창열림)

nlogn's log :: 천천히 달리기

levenshtein to slow, how to speed it up

you don’t need 40′000 levenshtein calls

+ Recent posts

티스토리툴바