Crawling vs Indexing: The Ratio Must be Close

0
172
views

The information that is visible to us does not appear without any strategy over the internet. Proper methods are used for making the information you want to come to your window console or your web browser in an appropriate manner. Crawling and indexation play a vital role in this process.

These two words are considered to be the basics of SEO. The entire web is dependent on these two processes. We must be clear about these terms so that we will have a gist about these two.

Crawling

The visit that is made by Google to track web pages is crawling. For this purpose, Google uses the spider and the best crawler that is known “Google bot.” It means to follow the path. Moreover, what is being followed are links and your website. When bots come to any pages, it looks up for further linked pages and then crawls website that is the reason for providing sitemaps.  From this, information about the pages, links, images, CSS, scripts and other elements of the site is gathered and then these are used as evaluation matrices.

Indexing

Just like any book, you will read an index Google maintains an index as well. Indexation means that the results being gathered from crawling are being added to the index. It is the process to add your pages to the Google search. Firstly crawling is done and then the index is being made.

Index to Crawl ratio

For the better results, the index to crawl ratio must be as close as possible. Let us consider an example to understand this matter.

Example

We will compare the search result of two book publishing sites. Xsit publishing has more than 8000 pages. While another publishing site Oldcastle Books has a total of 134 pages. When it comes to the search results of both, then Oldcastle Books takes the lead. Out of 134 pages, 129 are being displayed on search results while search results for Xsit publishers are opposite. Despite having more total pages, only 885 are retrieved for search results. So the ratio of crawled and indexed pages for Xsit is varying tremendously while for Oldcastle Books is quite close to 1:1. What is the reason behind the fact that rest of the pages were not brought to search results?

 

 

Reasons for difference in crawled and indexed pages

Doubling content

Some evaluation matrices are being set by search engines for the retrieval of pages from index to search results. These include uniqueness of the site’s content, proper use of Meta tags, titles and page description, etc. if any duplication is found it leaves a negative impact and you may be completely penalized by search engines.

            Factors of duplication

  • Doorway Pages

The intention of such pages is to get a higher ranking. These give same multiple pages in the search result. All such pages lead you to the same direction. They may direct users to many pages that are not useful.

  • Different formatting

The only difference between three copies of the pages is their format and design.

  • Capitalization errors

This too can be the reason for duplication that use of different cases for URL can lead to different pages, but in actual these pages have the same content.

Flawed identity

The domain that you choose identifies you uniquely from the rest. Be specific about the type of domain you are using. Whether it is a WWW or non-WWW. It is your choice that you may choose any of the two, but you must specify the one you choose. In case you have not specified it, Google will consider it to WWW. The consequence of this will be that Google will consider your WWW and non-WWW versions as two different sites. This will result in much lower ranks. If a site has specified that the domain it is using is WWW, so if someone writes its non-WWW version, he will be automatically directed toward the actual site.

How to specify domain name

  • Set your domain to WWW or non-WWW on Google console Homepage: You can achieve that by using .htacces file in your cPanel file manager.
  • Redirect notification: Setup 301 notification that domain has been moved, for notification of users and for directing them to correct destination.
  • Use robots.txt: Make use of robots.txt to prevent some page to be indexed. However, do ensure that pages are being included during the crawling process.

Conclusion

The efforts to resolve the ratio of crawled and indexed pages can lead you to improve your ranking on search engines. The truth is that it is just one of the evaluation matrix for better ranking. Better results will appear when you consider every possible thing in your hand.

Image Credit: Techliva

LEAVE A REPLY

Please enter your comment!
Please enter your name here