Tuesday, January 29, 2008

MOSS Indexing

Joel Olson has a great blog post on the anatomy of MOSS indexing. Here is an excerpt from his post. When a full crawl is started, then :
  • Indexer communicates with a WFE web service, the sitedata.asmx
  • Enumerates the URLs and gathers metadata
  • With the URLs it issues GETS for the content from the content database to retrieve the page content and subsequent documents and lists
When an incremental crawl is started, then:
  • Indexer communicates with a WFE web service sitedata.asmx to read the change log through WSS Object model
  • Does enumeration and returns changed/added/deleted URLs and metadata to the indexer
  • Indexer issues http GETs to index the relevant content
I can also add a few tips:
  • An incremental crawl will go ahead and delete any content it cannot find (even if it was previously found). Be careful about crawl rules being added and deleted. Also check to see if all your pages are working properly or else they will be deleted
  • Incremental crawl will index aspx files and all other document types during incremental crawl. It will NOT detect changes to aspx files
  • Incremental crawl is very fast compared to a full crawl.
  • You can easily run into memory exceptions (specially 'Error in Site Data Web Service: Out of Memory Exception'). This might be because of the number of documents in the library. The only way out can be to reduce the number of documents/migrate to 64 bit (I cannot confirm this statement...Iam still having a simillar problem). You should also try to add Impact rules and change the interval between each request.

No comments:

Post a Comment