What is crawl and  how its work in SharePoint 2013 – On Premises Environment

In broad terms, SharePoint Search is comprised of three main functional process components:
  1.      Crawling (Gathering): Collecting content to be processed
  2.            Indexing: Organizing the processed content into a structured/searchable index
  3.      Query Processing: Retrieving a relevant result set relative to a given user query

Type of Crawl in SharePoint 2013

  1.    Full Crawl
  2.    Incremental Crawl
  3.    Continuous Crawl


Advantages
Disadvantages
Continuous Crawl
Work in Parallel mode and maintain the index as current as possible.
Its Only Work for SharePoint Objects
 It doesn’t work with Non-SharePoint object
Incremental / Full  Crawl
It work for both SharePoint and Non-SharePoint Object
Its work in Sequential mode. Unless until first cycle doesn’t complete second can’t start and it wait till first end.
   

Full Crawl:  
Full crawl: - crawls entire content under a content source – IT can be SharePoint Object and Non-SharePoint Object also.

Incremental Crawl:
Incremental crawl: - crawls the content which has been added/modified after last successful crawl.

Comparison between Full and Incremental Crawl


  1. As compared with incremental crawls, full crawls chew up more memory and CPU cycles on the    index.
  2. Full crawls consume more memory and CPU cycles on the Web Front End servers when crawling  content in your farm.
  3. Full crawls use more network bandwidth than incremental crawls.

There are some scenarios where incremental crawl doesn’t work and you need to run full crawl.

Why do we need Full Crawl?

  1. Software updates or service packs installation on servers in the farm.
  2. When an SSP administrator added new managed property.
  3. Crawl rules have been added, deleted, or modified.
  4. Full crawl is required to repair corrupted index. In this case, system may attempt a full crawl     (depending on severity of corruption)
  5. A full crawl of the site has never been done.
  6. To detect security changes those were made on file shares after the last full crawl of the file share.
  7. In case, incremental crawl is failing consecutively. In rare cases, if an incremental crawl fails one hundred consecutive times at any level in a repository, the index server removes the affected content from the index.
  8. To reindex ASPX pages on Windows SharePoint Services 3.0 or Office SharePoint Server 2007     sites. The crawler cannot discover when ASPX pages on Windows SharePoint Services 3.0 or  MOSS sites have changed. Because of this, incremental crawls do not reindex views or home pages when individual list items are deleted.


Full Crawl
Incremental Crawl
Continuous Crawl
Crawl full items
Can be scheduled
Can be stop and paused
When required
Change content access account
Added new manage properties
Content enrichment web service codes change/modified.
Add new I Filter

           Crawl last modified content
           Can be scheduled
           Can be stop and paused
           When required
           Crawl last modified content 

           Index as current as possible.
           Cannot be scheduled
           Cannot be stop and paused (Once started, a "Continuous Crawl" can’t be paused or stopped, you can just disable it.)
           When required
           Content frequently changed (Multiple instance can be run in parallel).
           Only for SharePoint Content Source
           E-commerce site in crass site publishing mode.


Note: You should not pause content source crawls very often or pause multiple content source crawls as every paused crawl consumes memory on index server.



Incremental Crawl Cycle




Continuous Crawl Cycle




No comments:

Post a Comment