Users of Google had a question ‘how Google detects the duplicate pages from the billions of pages creating every day?” Martin Splitt, the developer at Google, has shared some notes regarding the process of detection of canonical pages. He has explained how Google eliminates duplicate or fraud pages from search engines.
He has also shared how Google weighs at least twenty different signals to identify a canonical page. Google also uses machine learning to perform this process.
Martin explains that Google first collects the signals of all the newly created pages. In the next step, the developers detect the duplicate pages.
First, they detect the duplicates and cluster them together. Now they know that these pages are the duplicate of each other. Now the developers have to identify the leader page of all of these.
Martin has also described this process. He said that they reduce the content into a hash or checksum first. Then they compare the checksums. Checksums are the extracts of the content. Martin has explained it like a fingerprint. It is easier to compare the extracts than the whole content.
This process of scanning can catch both exact-duplicate and near-duplicate sites. Developers and analysts compare those checksums to eliminate similar content.
The elimination of clusters is not so easy. Sometimes it is hard for humans to choose the eligible page in a search engine. Here the developers employ the signals. These are an https URL, sitemap approval, presence of redirection, etc.
Machine learning is quite important in this step. The correct application of all these signals to the clusters and analyzing signal weights is very hard for humans. Manually adjustment of signal weight is a nightmare for the developers. Also, it takes a long time to do it with human effort.
Martin has also said that users don’t like to see the same thing every time in search engines. Also, the storage space is not indefinite. That’s why the developers have to do this canonicalization.
YouTube is not going to share the revenue with the Non-monetized Channels for showing ads
Google Local moves the Suggest an Edit on mobile within the vertical three dots option
Google’s new update on the Search Console Crawl Stats Report
YouTube starts their experimentation on the automated video chapters to improve the navigation
Twitter is going to launch story-like Fleets for its users
YouTube Analytics comes with reports on the traffic sources
Google Ads is working on the Insights page to make business easier
Business10 months ago
Wix launches Editor X, website maker for designers and web agencies
Domains9 months ago
8 best domain flipping platforms
News2 years ago
Google Search Rankings showing early signs of an algorithm update
Domains10 months ago
Domains with .com extension are going to become expensive soon
Internet Marketing1 year ago
Snapchat’s new ‘Multi-Snap’ feature and New Sticker options seen in testing
News12 months ago
Google announces News Publisher Center and changes to approval of News websites
Business4 months ago
8 Best Digital Marketing Books to Read in 2020
News1 year ago
DailyMail admits losing 50% traffic after Google June 2019 Core Update