Shingles Algorithm that will allow for two texts to be compared and in return a numerical value will be provided that somehow correlates to the level of similarity.
How search engines define similarity of texts?
There is Shingles algorithm, allowing simple duplicate content check to be convinced that between them exists a similarity.
How the Shingles algorithm works?
Splitting of texts into words, and then comparison of the received matrix. So to become not important if you have simply rearranged words or offers (if division goes on 1 word). Text Splitting can be both by one word, and on some, ie shingle from several words.
This service make possible to check content for uniqueness after document changes.
To Compare you need the original text and altered (rewrite) copy.
- Full Screen Button
- Add English translation
Before comparison the text passes the minimum cleanings and changes:
- Strip HTML tags from a string such as <strong>
- Make a string lowercase
- Strip Commas, points, apostrophes, new line character, double blanks, slashes.
- Remove "stop-words"