What is crawl and how its work in SharePoint 2013 – On Premises Environment
In broad terms, SharePoint Search is comprised of three main
functional process components:
- Crawling (Gathering): Collecting content to be processed
- Indexing: Organizing the processed content into a structured/searchable index
- Query Processing: Retrieving a relevant result set relative to a given user query
Type of Crawl in
SharePoint 2013
- Full Crawl
- Incremental Crawl
- Continuous Crawl
Advantages
|
Disadvantages
|
|
Continuous
Crawl
|
Work in Parallel mode and maintain the
index as current as possible.
Its Only Work for SharePoint Objects
|
It doesn’t work with Non-SharePoint
object
|
Incremental
/ Full Crawl
|
It work for both SharePoint and
Non-SharePoint Object
|
Its work in Sequential mode. Unless
until first cycle doesn’t complete second can’t start and it wait till first
end.
|
Full Crawl:
Full crawl: - crawls entire content under a content source –
IT can be SharePoint Object and Non-SharePoint Object also.
Incremental Crawl:
Incremental crawl: - crawls the content which has been
added/modified after last successful crawl.
Comparison between Full and Incremental Crawl
Comparison between Full and Incremental Crawl
- As compared with incremental crawls, full crawls chew up more memory and CPU cycles on the index.
- Full crawls consume more memory and CPU cycles on the Web Front End servers when crawling content in your farm.
- Full crawls use more network bandwidth than incremental crawls.
There are some scenarios where incremental crawl doesn’t work and you need to run full crawl.
Why do we need Full Crawl?
- Software updates or service packs installation on servers in the farm.
- When an SSP administrator added new managed property.
- Crawl rules have been added, deleted, or modified.
- Full crawl is required to repair corrupted index. In this case, system may attempt a full crawl (depending on severity of corruption)
- A full crawl of the site has never been done.
- To detect security changes those were made on file shares after the last full crawl of the file share.
- In case, incremental crawl is failing consecutively. In rare cases, if an incremental crawl fails one hundred consecutive times at any level in a repository, the index server removes the affected content from the index.
- To reindex ASPX pages on Windows SharePoint Services 3.0 or Office SharePoint Server 2007 sites. The crawler cannot discover when ASPX pages on Windows SharePoint Services 3.0 or MOSS sites have changed. Because of this, incremental crawls do not reindex views or home pages when individual list items are deleted.
Full Crawl
|
Incremental Crawl
|
Continuous Crawl
|
Crawl full items
Can be scheduled
Can be stop and paused
When required
Change content access account
Added new manage properties
Content enrichment web service codes change/modified.
Add new I Filter
|
Crawl last modified content
Can be scheduled
Can be stop and paused
When required
Crawl last modified content
|
Index as current as possible.
Cannot be scheduled
Cannot be stop and paused (Once started, a
"Continuous Crawl" can’t be paused or stopped, you can just disable
it.)
When required
Content frequently changed (Multiple
instance can be run in parallel).
Only for SharePoint Content Source
E-commerce site in crass site publishing
mode.
|
Note: You should not pause content source crawls very often or pause multiple content source crawls as every paused crawl consumes memory on index server.
Incremental Crawl Cycle
Continuous Crawl Cycle
No comments:
Post a Comment