Gurgaon, , India
Javascript, HTML, jQuery
0 टिप्पणी करें | 81 लोगो ने देखा है | 25 दिसम्बर 12  | Sunil Kumar Yadav
Figure A Showing Soundex Implementation in C#
Today i encountered an issue in my Projectrsquos Database and the issue was in my database same table have many redundant data and some of them are same entries but have spelling mistakes so this algo saved my life thatrsquos why i am sharing it with u guys. Some example of redundant values in database is Name Zaheer or Zahir or Zaheeeeer so this algo find the similarity between strings like this. Soundex introduction Soundex is an algorithm that converts words to an encoded string based on the way the word is pronounced. This allows you to compare words based on pronunciation instead of binary matches. For example, Zach and Zack are pronounced exactly the same way. However, the string ldquoZachrdquo does not equal the string ldquoZackrdquo which means that a normal search would not mark them as a match. After running ldquoZachrdquo and ldquoZackrdquo through Soundex you will see that they both have the same encoding Zach encodes as ldquoZ200Prime Zack also encodes as ldquoZ200Prime This works by breaking the string down into individual characters and assigning each character a value. The rules for the algorithm are shown below The first letter of the encoding will always be the first letter of the string Ignore the letters a, e, h, I, o, u, w, and y except in rule 1 Use the table below to assign the remaining letters their encoding 1 should be assigned to b, f, p, and v 2 should be assigned to c, g, j, k, q, s, x, and z 3 should be assigned to d and t 4 should be assigned to l 5 should be assigned to m and n 6 should be assigned to r Any letters with the same encoding that appear subsequently to each other should be ignored. For instance the sequence ldquoBBrdquo would produce the encoding ldquo1Prime. The sequence ldquoBABrdquo would also produce the encoding ldquo1Prime since A is an ignored character. The total length of the encoding must always be four. If you have more than four characters in your encoding, trim the extra characters. If you have less than four characters pad the encoding with zeroes to bring the length to four. Below are a few examples produced using this algorithm Donald ndash D543 Zach ndash Z200 Campbel ndash C514 Cammmppppbbbeeelll ndash C514 David ndash D130 Another thing to note is that Soundex is case insensitive. So ldquoZACHrdquo and ldquozAcHrdquo both produce the same encoding. Implementing Soundex in C The code shown in Figure A demonstrates one way to implement Soundex in C. While there are surely other methods, this one seems to be a good balance between readability and performance. Figure A The Soundex Function The above code loops through the data supplied and determines which encoding, if any, should be applied to the current character. It then pads the encoding with zeroes if necessary and returns the encoding. Note This algo is not 100 right but it works with almost 90 correctness in my case.

    • इस ब्लॉग के लिए सामाजिक शेयर

पोर्फोलिओ और ब्लॉग
Sunil Kumar Yadav विभिन्न कंपनियों का अनुसरण करता है, ये कंपनियां और नियोक्ता Sunil के फिर से शुरू देख सकते हैं
सबसे अच्छा नौकरी के अवसर पाने के लिए अपना फिर से शुरू करें अपलोड करें

मुफ्त रजिस्टर करें!