Load testing on indexing BDC data

December 12, 2007

The BDC source here are in a MS SQL database. In most of the test the SharePoint indexing server, SharePoint SQL and the BDC data source SQL are in same server (a very powerful one).

Initially the speed was very slow (7.5K per hour) and seems it didn”t downgrade as the crawled records accumulated all the way to 1+ million. Later it turned out to be that the source DB was not properly indexed. So the source DB was the bottleneck (CPU was constantly 90%+).

After the index was added to source DB, the speed became 160K per hour. But as the crawled records went up, the speed was slowed down to 40K per hour (with 2 million records).

bdc-load-test.gif

The space it uses seems to be about 5 times of the original SQL database.

Small note: it will take about 18 minutes to get ID list of the every 1 million records.


Business Data Catalog Links

December 6, 2007

One authentication mode:


Findings on SharePoint Search – BDC – 2

November 8, 2007

In search result page, when you click on the “View Profile” page, it does call the database to retrieve the data. So Authentication mode is important here. If you specify “PassThrough”, make sure the “Default content access account” (seem in Search Settings page) has access to that DB. Or, change to “RevertToSelf” and the identify of the application pool will be used – please note this is the SSP application pool, not the central admin application pool. However, for unknown reason, The “Default content access account” still need access to the database, otherwise the crawling will fail even the “View Profile” URL can retrieve the data from database. Also, don’t forget to give “Default content access account” (the account used to crawl) appropriate rights to the BDC application and entity.

Please note that there might be a delay of a few minutes after the permission is changed on BDC application and entity. If the crawling still has “Access denied b BDC” error, wait for a few minutes.

You can delete and re-import the BDC application definition at the anytime – no need to re-crawl as long as the BDC name is not changed.

Incremental Crawl – it’s said that if you add a LastModifiedDate column to the record, SharePoint indexing service will use that as a time stamp and incremental crawling is enabled. However in my various tests, it took virtually same time in full and incremental crawling. Either this doesn’t work, or in my test case, most of the time was spent on retrieving the data instead of indexing them.

Some good articles to read:


Findings on SharePoint Search – BDC – 1

November 7, 2007

Steps to add BDC to Sharepoint search:

  1. Define BDC XML file (you can try this GUI editor but it doesn’t help that much. You’d better still have to read the XML file through and understand it.)
  2. Import BDC Application file.
  3. Create a new Content Type and choose the BDC as the source
  4. Start Full Crawl
  5. Create a Scope to use this Content Type
  6. Configure (if needed) the web site to use this scope (by clicking on the Scope display group name)

After a content source is deleted, the index items of the content source will NOT be deleted immediately. Instead, the SharePoint “Gather” will delete them one by one at the background, and a warning message will be generated for each deleted items! It seems no full crawl can be done before this process is finished (at least no BDC crawling). It will take more than a day to remove 1.5 M records – aboutn 600 records a minute. Slooooooow! If one just want to erase all indexed content, one can click on “Reset all crawled content” link on search setting page. That is very fast.

It seems no way to purge the humongous crawl log. MS site says it will deleted after 5 days – to be confirmed.

In Search page (search web part), if choose to show Scope drop down, the search will be limited to the selected scope. However if scope drop down is not showed, it seems always search on a hard-coded “All Sites” scope! (Yes I tried to delete that scope and I got error message saying ‘Scope does not exist’). Even worse, search web service (search.asmx) can not even specify scopes.

Update: got solution for UI search page (original post):

“You need to add a new search results page to your search center (or edit an existing results page). Set the Scope property in the “Search core results” web part (in the “Miscellaneous” section) on the search results page to your dedicated scope.”

However this doesn’t exactly solve the problem. When the URL contains s=[scope name], it still shows the result from other scope (this is reasonable). And it doesn’t help on web service call.

Update: solution for web service search.asmx (link here):

SELECT URL, Title, Description FROM SCOPE() WHERE “scope”=’All Sites’ AND FREETEXT(‘gallery hinges’) AND SITE = “http://supportdesk” AND NOT CONTAINS(‘brass’)


Follow

Get every new post delivered to your Inbox.