Redirecting from osssearchresults.aspx

February 11, 2008

In SharePoint MOSS 2007, when you turn on the custom scope in site collection’s “Search Settings”, most of search results will be displayed at /SearchCenter/Pages/results.aspx. The exceptions are at the contextual search (This site, This List: Documents etc) and it always displays the search result in OSSearchResults.aspx page, which you can not customize through web parts (as you can do with /SearchCenter/Pages/restuls.aspx). This trick seems works very well (original post here):

Open OSSearchResult.aspx in layout folder, and add this block:

<script language = “javascript”>
function getURLParam(strParamName){
var strReturn = “”;
var strHref = window.location.href;
if ( strHref.indexOf(“?”) > -1 ){
var strQueryString = strHref.substr(strHref.indexOf(“?”)).toLowerCase();
var aQueryString = strQueryString.split(“&”);
for ( var iParam = 0; iParam < aQueryString.length; iParam++ ){
if (
aQueryString[iParam].indexOf(strParamName.toLowerCase() + “=”) > -1 ){
var aParam = aQueryString[iParam].split(“=”);
strReturn = aParam[1];
break;
}
}
}
return unescape(strReturn);
}
var urlstring = ‘http://your site url/custom-search.aspx?k=’ + getURLParam(‘k’) + ‘&cs=’ + getURLParam(‘cs’) + ‘&u=’ + getURLParam(‘u’)

location.replace(urlstring);
</script>

I added right after the stylesheet block and I guess you can remove some un-used code block since this page will not be used at all.


Add support for PDF indexing – and more about iFilters

December 21, 2007

Out of the box, SharePoint only index/crawl DOC, XLS, PPT, TXT and HTM files. Here are the steps to install Acrobat PDF IFilter 6.0 on a SharePoint 2007 server to allow PDF file content to be indexed by the Search and for the correct icon to be shown on web UI.

First download Adobe PDF IFilter 6.0.

  1. Run the Adobe PDF IFilter 6.0 Setup program. Note: if you have SQL on a different server then you need to install the iFilter on the SQL Server not on the IIS server. (But I also installed on SSP server just in case.)
  2. The following steps are on the web server.
  3. Copy the ICPDF.gif file to “\12\Template\Images”. ( I googled this icpdf.gif from internet)
  4. Edit the file \12\TEMPLATE\XML\DOCICON.XML to add an entry for the .pdf extension.
    1. <Mapping Key=”pdf” Value=”icpdf.gif”/>
  5. Do an iisreset or recycle the appropriate application pool
  6. Add the .pdf file type to the index list:
    1. Go to SSP->Search Settings and next to File Type, add a new file type pdf
  7. Perform a Full Crawl
  8. My experience indicates that a server reboot is necessary.

RTF filter is available for SharePoint 2003 but the download link was removed from MS site. Hopefully MS will have a updated version for MOSS 2007.

Some other userful links:

  • Microsoft SharePoint team just recently released more filters to support more file types (Office 2007 type, Visio, zip etc): http://blogs.msdn.com/sharepoint/archive/2007/12/20/ms-filter-pack-released.aspx
  • A iFilter shop you can buy more filters: http://www.ifiltershop.com/products.html

Reference:

  • http://support.microsoft.com/kb/555209
  • http://weblogs.asp.net/bugmandan/archive/2007/03/09/sharepoint-2007-acrobat-pdf-ifilter-6-0.aspx

Updated 7/17/2008:

Thanks Francois for this tip:

It is important to note that the regular iFilter doesn’t support 64Bit version of SharePoint and a special iFilter needs to be installed. The following link will shed more light on the iFilter from Foxit. http://naveedullah.wordpress.com/2007/05/12/64-bit-pdf-ifilter-for-moss-now-available/


10 Things To Optimize your SharePoint Server Indexing

December 21, 2007

Good post from Joel Oleson’s blog – I found it (and the links inside) very useful.

Quote below:

1) Put your Search db and on separate disks transaction logs, both the fastest most optimized disks with fast optimized spindles for writing (dedicated disks are essential)
2) Optimize your temp db: grow it, give it space, you can even split it into multiple dbs and ensure it is on the most optimized disks (dedicated disks are essential). Don’t forget to optimize the transaction logs of the temp db either!
3) Optimize the network between the servers you are indexing and the index server (GB NIC speed is preferred within a farm)
4) Consider topology changes to optimize network throughput and eliminate double hops (Index server crawling a separate front end (shared by user traffic) to pull changes. Adding the WFE role to your Index server and adding applicable host files is a great way to optimize the indexing and optimize your traffic at the same time.
5) Increase the your RAM on your x64 SQL Servers (8 GB is really a good place to start, 16 GB or more is looking better and better.)
6) Defragment your databases, and applicable drives (if fragmented) and run relevant dbcc consistency checks – Refer to KB on SharePoint Safe DBCC commands
7) Increase the # of crawl threads (you’ll have to watch this, it is the easiest way to speed up your crawls, but watch the box it is “attacking” it can be heavy handed.)
8) Reduce the maximum index file size (optional)
9) Remove any unused, single threaded and poor performing ifilters
10) Reduce the amount of full indexes, run incremental crawls on a schedule where they can complete, and remove non essentials such as every 5 minute incremental jobs these will simply cause unnecessary churn.

Bonus: Install the public update or the service pack when it comes out (includes a few SharePoint Indexing related fixes).

More on disk optimization on a post I did a while back… Also there’s a great paper that just got posted on storage and performance optimization. It is a MUST READ. Performance recommendations for storage planning and monitoring.

Getting crawled and you don’t want to? Here’s a recent KB on how to configure the robots.txt in your SharePoint deployment. There is some more info in this post from the field. It is very easy to have 50% of traffic as the crawl account. Optimizing the indexing by reducing even authentication traffic is a big deal. Use accounts that are in the same domain and where the DCs are fast and local if using NTLM. Kerberos might end up being slightly faster, but does add complexity.


Error event ID 6398 and 6482 – about security rights of OSearch service

December 18, 2007

I have huge number of following erros in the event log “Application folder”:

Event ID 6398
The Execute method of job definition Microsoft.Office.Server.Search.Administration.IndexingScheduleJobDefinition (ID b94e8106-b5f9-4c2d-ad98-2871bcc4c669) threw an exception. More information is included below.
Retrieving the COM class factory for component with CLSID {3D42CCB1-4665-4620-92A3-478F47389230} failed due to the following error: 80070005.

And this:

Event ID 6482
Application Server Administration job failed for service instance Microsoft.Office.Server.Search.Administration.SearchServiceInstance (7d8b475a-6dda-47e8-8ab7-dbd171926b39).
Reason: Retrieving the COM class factory for component with CLSID {3D42CCB1-4665-4620-92A3-478F47389230} failed due to the following error: 8007000e.

Actually in “System” folder there are 2 events revealing more information about it. It turned out that I need to grant the account that is used by “Office SharePoint Server Search” with Local Activation rights.

Open Component Services->DCOM Config->OSearch->Properties->Security, I added Network Service (may not need) and the account to run “Office SharePoint Server Search” service and gave them “Local Activation” rights (in “Launch and Activation Permissions” group). Those error messages do not appear anymore.

This doesn’t limit to this service and SharePoint only. You can actually search on registry on that CLSID and it might be different DCOM object. For example the other day I got same error with CLSID
{61738644-F196-11D0-9953-00C04FD919C1} and it turned out to IIS WAMREG admin Service.


Load testing on indexing BDC data

December 12, 2007

The BDC source here are in a MS SQL database. In most of the test the SharePoint indexing server, SharePoint SQL and the BDC data source SQL are in same server (a very powerful one).

Initially the speed was very slow (7.5K per hour) and seems it didn”t downgrade as the crawled records accumulated all the way to 1+ million. Later it turned out to be that the source DB was not properly indexed. So the source DB was the bottleneck (CPU was constantly 90%+).

After the index was added to source DB, the speed became 160K per hour. But as the crawled records went up, the speed was slowed down to 40K per hour (with 2 million records).

bdc-load-test.gif

The space it uses seems to be about 5 times of the original SQL database.

Small note: it will take about 18 minutes to get ID list of the every 1 million records.


Findings on SharePoint Search – BDC – 2

November 8, 2007

In search result page, when you click on the “View Profile” page, it does call the database to retrieve the data. So Authentication mode is important here. If you specify “PassThrough”, make sure the “Default content access account” (seem in Search Settings page) has access to that DB. Or, change to “RevertToSelf” and the identify of the application pool will be used – please note this is the SSP application pool, not the central admin application pool. However, for unknown reason, The “Default content access account” still need access to the database, otherwise the crawling will fail even the “View Profile” URL can retrieve the data from database. Also, don’t forget to give “Default content access account” (the account used to crawl) appropriate rights to the BDC application and entity.

Please note that there might be a delay of a few minutes after the permission is changed on BDC application and entity. If the crawling still has “Access denied b BDC” error, wait for a few minutes.

You can delete and re-import the BDC application definition at the anytime – no need to re-crawl as long as the BDC name is not changed.

Incremental Crawl – it’s said that if you add a LastModifiedDate column to the record, SharePoint indexing service will use that as a time stamp and incremental crawling is enabled. However in my various tests, it took virtually same time in full and incremental crawling. Either this doesn’t work, or in my test case, most of the time was spent on retrieving the data instead of indexing them.

Some good articles to read:


Findings on SharePoint Search – BDC – 1

November 7, 2007

Steps to add BDC to Sharepoint search:

  1. Define BDC XML file (you can try this GUI editor but it doesn’t help that much. You’d better still have to read the XML file through and understand it.)
  2. Import BDC Application file.
  3. Create a new Content Type and choose the BDC as the source
  4. Start Full Crawl
  5. Create a Scope to use this Content Type
  6. Configure (if needed) the web site to use this scope (by clicking on the Scope display group name)

After a content source is deleted, the index items of the content source will NOT be deleted immediately. Instead, the SharePoint “Gather” will delete them one by one at the background, and a warning message will be generated for each deleted items! It seems no full crawl can be done before this process is finished (at least no BDC crawling). It will take more than a day to remove 1.5 M records – aboutn 600 records a minute. Slooooooow! If one just want to erase all indexed content, one can click on “Reset all crawled content” link on search setting page. That is very fast.

It seems no way to purge the humongous crawl log. MS site says it will deleted after 5 days – to be confirmed.

In Search page (search web part), if choose to show Scope drop down, the search will be limited to the selected scope. However if scope drop down is not showed, it seems always search on a hard-coded “All Sites” scope! (Yes I tried to delete that scope and I got error message saying ‘Scope does not exist’). Even worse, search web service (search.asmx) can not even specify scopes.

Update: got solution for UI search page (original post):

“You need to add a new search results page to your search center (or edit an existing results page). Set the Scope property in the “Search core results” web part (in the “Miscellaneous” section) on the search results page to your dedicated scope.”

However this doesn’t exactly solve the problem. When the URL contains s=[scope name], it still shows the result from other scope (this is reasonable). And it doesn’t help on web service call.

Update: solution for web service search.asmx (link here):

SELECT URL, Title, Description FROM SCOPE() WHERE “scope”=’All Sites’ AND FREETEXT(‘gallery hinges’) AND SITE = “http://supportdesk” AND NOT CONTAINS(‘brass’)


Follow

Get every new post delivered to your Inbox.