Searching in WSS 3.0 and MOSS 2007

Excerpt from Beginning SharePoint 2007 Administration: Windows SharePoint Services 3.0 and Microsoft Office SharePoint Server 2007 

by Göran Husman

Search engines are among the greatest time savers. Just look at how often you use MSN Search or Google, just to mention a couple of them. On the Internet, searching is absolutely critical, since you have no idea where information is stored, and there may be new sources one minute from now. That is why you search all the time. This is not really that different from the way you use your internal network. True, the volume of information is much smaller in your network, and you know where at least some of it is stored, since you created it. Still, it does not take much activity within an organization to create so much information that the average user loses track of where things are stored. So, users start looking around to find the file, document or whatever they are looking for. After some minutes, they find it. The question then becomes is this the latest version, or is there a newer version somewhere? Then when they get what they’re looking for, they most likely want to be notified if that document gets updated later on. What you need is a solution that helps you:

  • Find information regardless of where it is stored.
  • Make sure that it is the latest version.
  • Send a notification to you when this information is updated.

SharePoint has solutions for the first bulleted item, but there is a great difference between what MOSS is offering and the WSS environment, as the following section will cover in more detail. The second bullet is covered by the built-in version management of documents and list content in both MOSS and WSS, and the third bulleted item is the alert feature, also a built-in feature in MOSS and WSS. So, let’s focus on the search functionality.

Searching in WSS 3.0

WSS 3.0 has a basic search feature that will allow users to search for content. This is a big change from previous WSS versions, which required WSS to be configured to run on SQL Server 2000, using its Full-Text Indexing engine. Another important change is that WSS 3.0 will search in subsites, while the previous version of WSS only searched in the current site. The following list summarizes the search functionality in WSS, when combined with SQL Server:

  • Finds information of any type, stored in the current site, or a subsite.
  • Provides free-text searching in documents, files, and all list content.

SharePoint will create a special Windows SharePoint Services Search (

MSSearch.exe

) in Windows 2003. This service must be running before WSS can use it for indexing and searching. The steps to configure WSS to use this search feature are described in the following Try It Out.

Try It Out: Activate Search Indexing in WSS

  1. Log on as an administrator,
  2. Start SharePoint’s Central Administrative tool. Switch to the Operations page, then click Services on Server.
  3. Make sure that the correct Server is selected, and then start Windows SharePoint Services Search service, if it’s not already started. Fill in this information in the Web form:

TIP

If WSS was installed using SQL Express, the Search service may not be listed. If this is the case, select All on the View menu on the toolbar.

  1. Service Account: Enter the service account and its password for the search service; be sure to include the domain name, for example,
    filobitsp_service

    .

  2. Content Access Account: Enter the default user account to be used by the search service when searching content sources. You can later configure other user accounts for specific content sources.
  3. Search Database: Enter the SQL server name, plus the Database Name for the index. Use the default database and name, if you don’t have a good reason not to.
  4. Indexing Schedule: Enter how often the index process will run. The default is every five minutes.
  5. Click OK to save and close the page.

TIP

When you create a new Web application, make sure to select a search server.

  1. When the process is done, all documents and lists are also indexed and ready to be searched within five minutes. Open any WSS team site, and use the search field at the top-right corner of the page. Type a text string that you know exists in any of the lists or inside any documents stored in this team site.

Understanding the Search Feature in WSS

When searching is activated in WSS, there is nothing more to configure. The search engine in WSS is fast and stable; its behavior is controlled by stored procedures in SQL Server. You may find tips on how to optimize these stored procedures, but before you do that you must understand that this will violate the conditions for getting help from Microsoft’s support team! It may also create problems when you install the next service pack or upgrade to the next release.

The objects indexed by Full-Text Indexing are these:

  • List items: Such as individual names in a Contact list.
  • Documents: Documents of these types:
    .doc

    ,

    .xls

    ,

    .ppt

    ,

    .txt

    , and

    .html

    .

  • Lists: Such as Announcements, and Events.

There are also objects that will not be indexed and therefore not searchable:

  • Nontext columns in lists — for example, Lookup fields, currency, Yes/No.
  • Attachments to list items.
  • Survey lists.
  • Hidden lists.

The process of reindex new or modified information is automatic in WSS, the default schedule is to run an incremental indexing process every five minutes. As soon as this process is done, users can search for it.

The search field in the top-right corner of the Web page (unless moved) is visible in all team sites. Enter the string you are searching and press Enter or click the icon to the right of the search field. Note that if you enter more than one text string, it will match any object with either or both of the strings; this is called a Boolean OR search. The search engine is using a type of search called

FREETEXT

; this type of search uses a feature called stemming. For example, if you search for the word Run, it will also match Running and Ran. Therefore, you must enter the complete word. You cannot search for Admin and find Administrator, for example, since Admin is an abbreviation, not a complete word.

TIP

Stemming only works with certain languages, such as English and German.

All of these constraints and behaviors are due to the way the stored procedures are defined. If you absolutely must change this, be sure to make a backup of the original stored procedure, and make notes of what you did and why, so that later on anyone can restore or remove this customization, if necessary.

TIP

You may find tips on the Internet about how to enhance the search functionality in WSS, but remember the warnings above about modifying stored procedures in SQL, since Microsoft will not support your system!

Indexing New File Types

You can have the MS Search service in WSS index more file types than it does by default. The most common request is for Adobes PDF files. What MS Search needs to index any file type is a program that can open that file and read its text. Such a program is called an index filter, or IFilter for short. So, to index PDF files you need an IFilter for PDF. The good news is that this IFilter is free to download from Adobe’s Web site:

www.adobe.com/support/downloads/detail.jsp?ftpID=2611

Note that this IFilter is regularly updated; make sure you get the latest version. After you have downloaded this program, install it on the SQL server, if you are using separate WSS and SQL servers.

TIP

This is only true if you are running a pure WSS environment! If you are running a MOSS server, this IFilter must be installed on all SharePoint servers with the Index role.

After the installation, Microsoft recommends in the following knowledge-base article that all existing PDF files must be reloaded, or updated, in order to be indexed:

http://support.microsoft.com/kb/927675/en-us

But in many cases it will actually be enough to force a full update in order to index these existing PDF files. 

Searching in MOSS 2007

This is one of the strongest features in Office SharePoint Server! It has its own search and index engine, completely independent of the Full-Text Indexing service in SQL Server. In fact, you can activate them both. However, it will be a waste of resources, since the MOSS search and indexing feature works in any Web site, including both MOSS sites and WSS sites. A summary of the search features in MOSS are:

  • Search everywhere in SharePoint — any MOSS site, any team site, and any workspace site.
  • Can search almost any content source outside SharePoint — file servers, MS Exchange servers, Lotus Notes, and other Web servers, including any public Web site on the Internet.
  • With MOSS Enterprise Edition you can use the Business Data Catalog feature to search in external databases and applications, such as Oracle, SAP, and Navision.
  • Search all MS Office file types by default, plus all neutral file formats, such as TXT, HTML, and so on.
  • Can be extended to search any file type. All that is needed is an IFilter for each file type.
  • You can control which file types are to be indexed, even if there is an IFilter installed for them.
  • The user profile properties will be indexed. You can search for a user with a specific property.
  • You can set the schedule for full and incremental indexing. You can also force a full indexing anytime.

This indexing and search feature is activated by default for all information stored in SharePoint, both MOSS sites and WSS team sites; there is no special configuration needed to activate this. Since this feature is much more advanced than the full-text search in SQL Server, there is also a lot more configuration you can to; however, this also require more management. You, as an administrator, must understand how this feature works in MOSS and what you can do to optimize it. This is especially true when a problem arises, such as when the search results are not as expected, or when a content source isn’t indexed. The following section will tell you all you need to know for your everyday work as an administrator. To extend and adjust this very important feature, you might want to read the additional coverage in Chapter 8, "Advanced Configurations," of the book, Beginning SharePoint 2007 Administration: Windows SharePoint Services 3.0 and Microsoft Office SharePoint Server 2007 (Wrox, 2007, ISBN: 978-0-470-12529-8).

The Basics

There are two MOSS services engaged in this feature:

  • Indexing: Responsible for crawling content sources and building index files.
  • Searching: Responsible for finding all information matching the search query by searching the index files.

This is important: All searching is performed against the index files; if they don’t contain what the user is looking for, there will not be a match. So, the index files are critical to the success of the search feature of MOSS. In fact, practically all configuration and management is related to the indexing service. The search functionality can be described in its simplest form as a Web page where the user defines his or her search query.

The index role can be configured to run on its own MOSS server, or run together with all the other roles, such as the Web service, Excel Services and Forms Services. It performs its indexing tasks following this general workflow:

  1. SharePoint stores all configuration settings for the indexing in its database.
  2. When activated, the index will look in SharePoint’s databases to see what content sources to index, and what type of indexing to perform, such as a full or incremental indexing.
  3. The index service will start a program called the Gatherer, which is a program that will try to open the content that should be indexed.
  4. For each information type, the Gatherer will need an Index Filter, or IFilter, that knows how to read text inside this particular type of information. For example, to read a MS Word file, an IFilter for
    .DOC

    is needed.

  5. The Gatherer will receive a stream of Unicode characters from the IFilter. It will now use a small program called a Word Breaker; its job is to convert the stream of Unicode characters into words.
  6. However, some words are not interesting to store in the index, such as "the", "a", and numbers; the Gatherer will now compare each word found against a list of Noise Words. This is a text file that contains all words that will be removed from the stream of words.
  7. The remaining words are stored in an index file, together with a link to the source. If that word already exists, only the source will be added, so one word can point to multiple sources.
  8. If the source was information stored in SharePoint, or a file in the file system, the index will also store the security settings for this source. This will prevent a user from getting search results that he or she is not allowed to open.

Pretty straightforward, if you think about it. But the underlying process is a bit more complex. Fortunately you do not need to dive into these details, unless you have a very good reason to. By default, MOSS will create a single index file. This index file is not stored on the SQL server, as the other information stored in SharePoint is; instead, it is stored in the file system on the server configured to run the Index role in the SharePoint farm. This index file is stored in separate folders in the following location (assuming that you have used the default installation folder):

<Drive:>Program FilesMicrosoft Office Servers12.0DATAOffice @@ta
ServerApplications<Application GUID>

The

Application GUID

is a unique hexadecimal string that identifies a specific SSP instance, such as

ae0cd4fe-ed29-418f-aa0f-eecfd7956b4f

. If you have more than one SSP instance created on the same server, you can check the following registry key to see exactly what portal this

Application GUID

is pointing to:

HKEY_Local_Machine/Software/Microsoft/Office @@ta
Server/12.0/Search/Applications/<GUID>/CatalogNames

The property

DisplayName

will tell you what SSP instance this is. The number of files and folders stored in each index folder may surprise you, but indexing is a complex process and it shows here. You do not need to configure these files, since everything is managed by SharePoint’s administration pages.

The Gatherer process keeps a log of all its activities. These log files are also stored in this folder structure, but the easiest way to view these log entries is to use SharePoint’s administrative Web pages.

For more information on MOSS 2007 search configuration and advanced settings including crawl settings, scopes, authoritative pages, errors and warnings, forcing updates, managing indexing schedules, controlling what files to index, adding new file types with other IFilters, see Chapter 8, "Advanced Configurations," of the book, Beginning SharePoint 2007 Administration: Windows SharePoint Services 3.0 and Microsoft Office SharePoint Server 2007 (Wrox, 2007, ISBN: 978-0-470-12529-8).

This article is excerpted from Chapter 8, "Advanced Configurations," of the book Beginning SharePoint 2007 Administration: Windows SharePoint Services 3.0 and Microsoft Office SharePoint Server 2007 (Wrox, 2007, ISBN: 978-0-470-12529-8), by Göran Husman. In 1993 Göran became one of the first certified MS Certified Trainers (MCT) in Sweden, and he has regularly conducted MS courses ever since. He is also certified by MS as an MCP (with the number 2888) and an MSCE. His great engagement in e-mail systems awarded him status as Sweden’s first MS Exchange MVP (Most Valuable Professional) by Microsoft. He switched focus to MS SharePoint in 2003, and in January 2006 Microsoft awarded him status as Sweden’s first SharePoint Portal Server MVP, which was renewed in January 2007. He is also frequently a speaker in conferences and seminars. Today Göran divides his time between consulting contracts, training, leading his company Human Data, and from time to time, writing books. Oh, and he is also the proud father of six great kids from the ages of 6 to 28, which may be his greatest achievement in life.

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *