Ryan Chen
2/13/17
Bitap Algorithm
There are lots of algorithms out there but the algorithm I’ve decide to choose is Bitap Algorithm because it talked about strings at the beginning and it looks interesting. This algorithm is also known as shift-or, shift-and or Baeza-Yates-Gonnet. The algorithm was written by Udi Manber, Sun Wu, and Burra Gopal. It was first invented by Balint Domolki in 1964 and extended by R. K. Shyamasundar in 1977 then Manber and Wu reinvented in the context of fuzzy string searching in 1991. In 1996 Baeza-Yates and Navarro improve the algorithm.
The way this algorithm work is they are string searchings that tells you weather if the text has a substring that is approximately equal to the pattern. There are two types of searching called fuzzy or exact searching. Exacting string searching uses zeros and ones. If a bit indicates a zero that means there is a match. If it indicates a one that means there are no matches. The algorithm can be written with the intuitive semantics for 0 and 1, but we must introduce another instruction into the inner loop to set R=1. In this implementation, left-shifting a value shifts in zeros on the right, which is precisely the behavior we need.
Fuzzy searching is a little different than exact searching. To use fuzzy string searching for the bitap algorithm, you need to extend the bit array R into a second dimension. Rather than of having one array R that changes over the length of the text, we now have k distinct arrays R1k. Array Ri holds a representation of the prefixes of pattern that match any suffix of the current string with i or fewer errors.An error may be an insertion, deletion, or substitution using this context.
I think what makes these algorithms special and unique is that it's a string matching and there are two different ways to do this algorithm by exact searching or fuzzy searching. I personally like exact searching because it made more sense to me when I read it. It uses ones and zeros and we talked about that kind of stuff in class so it makes more sense.
I found that Jaro-Winkler distance that was created by Winkler in 1990 is similar to the Bitap Algorithm because it measures a similarity between two strings. But, it is only best suited for short strings such as someone's name. It uses ones and zeros just like exact string matching one indicating an exact match and zero indicating no similarity.
There are some algorithms that can be done manually for example like sorting out a deck of cards. For the Bitap algorithm I didn’t see any sources saying you can do it manually but I assume you will be able to sort out words similarity to have it match but it will take a very long time. The reading in the textbook relate to this algorithm in (Chapter 5) where it talks about matching socks, traveling salesman problem, and the possibilities of a certain move. The textbook reading did give me a better understanding of how algorithms work by giving examples like the shortest time possible to match your socks(P.77-78) because before the reading I had no idea what algorithms were. I also learned computers can’t solve every problem using algorithms like the traveling salesman problem(P.87-89).
Bitap Algorithm helps you match a text that is approximately equal(Wikipedia). You can use it in a real world situation for example if you want to find how many students in a school has the same name or a most common name you would use Bitap algorithm. Another example you can use this algorithm in a real world is to find out what is the letter that is being used the most to name a word so you can encode it to save data.
"Bitap Algorithm." Wikipedia. Wikimedia Foundation, n.d. Web. 13 Feb. 2017.
No comments:
Post a Comment