News
Martin Splitt explains how Google selects a canonical page
Users of Google had a question ‘how Google detects the duplicate pages from the billions of pages creating every day?” Martin Splitt, the developer at Google, has shared some notes regarding the process of detection of canonical pages. He has explained how Google eliminates duplicate or fraud pages from search engines.
He has also shared how Google weighs at least twenty different signals to identify a canonical page. Google also uses machine learning to perform this process.
Martin explains that Google first collects the signals of all the newly created pages. In the next step, the developers detect the duplicate pages.
First, they detect the duplicates and cluster them together. Now they know that these pages are the duplicate of each other. Now the developers have to identify the leader page of all of these.
Martin has also described this process. He said that they reduce the content into a hash or checksum first. Then they compare the checksums. Checksums are the extracts of the content. Martin has explained it like a fingerprint. It is easier to compare the extracts than the whole content.
This process of scanning can catch both exact-duplicate and near-duplicate sites. Developers and analysts compare those checksums to eliminate similar content.
The elimination of clusters is not so easy. Sometimes it is hard for humans to choose the eligible page in a search engine. Here the developers employ the signals. These are an https URL, sitemap approval, presence of redirection, etc.
Machine learning is quite important in this step. The correct application of all these signals to the clusters and analyzing signal weights is very hard for humans. Manually adjustment of signal weight is a nightmare for the developers. Also, it takes a long time to do it with human effort.
Martin has also said that users don’t like to see the same thing every time in search engines. Also, the storage space is not indefinite. That’s why the developers have to do this canonicalization.
-
Domains5 years ago
8 best domain flipping platforms
-
Business4 years ago
8 Best Digital Marketing Books to Read in 2020
-
How To's5 years ago
How to register for Amazon Affiliate program
-
How To's5 years ago
How to submit your website’s sitemap to Google Search Console
-
Domains4 years ago
New 18 end user domain name sales have taken place
-
Business4 years ago
Best Work From Home Business Ideas
-
How To's4 years ago
3 Best Strategies to Increase Your Profits With Google Ads
-
Domains4 years ago
Crypto companies continue their venture to buy domains