About

Limitations and how it all works

Crawl log

Crawl log will retrieve up to 2,000 crawl log entries for you to inspect and filter based on content source and status. The reason why it's set to a finite number is that the API's don't work correct with paging. If you don't see the entry you are looking for in the first 100 results, increase the page size up to 2,000.

Filtering on a specific user profile in the crawl log requires that the profile is among the top 10,000 entries fetched behind the scenes. The reason is a bug in the API which prevents doing URL prefix searches directly, and filtering has to happen in this application instead.

How re-indexing works in general

There is no way to force an incremental or full crawl in SharePoint Online like you can on-premises. Your only option is to mark items to be picked up on the next crawl cycle. That means it can take anywhere from minutes to hours before you see an item re-indexed. This is where the crawl log is useful to see when an item was last picked up by the crawler.

To trigger items for re-index, the application requires an extra helper add-in (Puzzlepart Search Toolbox Helper Service). This add-in has to be installed manually as marking user profiles or files to be re-indexed on the next crawl cycle requires full control permissions in SharePoint.

For file and list items the property bag of a site or list has to be updated, which mimic's the behavior of clicking the "Re-index" button on advanced settings in the SharePoint UI.

For user profiles you need to get a new saved timestamp on the profile for it to be re-indexed. The API to update user profiles also require full control in addition to user profile access in order to work.

Re-Index of user profiles

The application uses search to retrieve all existing user profiles. Next a file is generated to update the Department field of a user profile with the exact same value which was already there. This ensures no data is changed, but each user profile get's a new time stamp to ensures re-indexing on the next user profile incremental crawl in SharePoint Online.

The flow is as follows:

  • Batch file is uploaded to SharePoint Online (Status=Submitted)
  • A timer job in SharePoint Online picks up the job (Status=Queued)
  • Each user profile will get a new last modified date - equal to each user saving their profile (Status=Processing)
  • The job completes with or without errors (Status=Succeeded/Error)
  • The SharePoint Online user profile crawler will index the updated profiles
See https://dev.office.com/blogs/introducing-bulk-upa-custom-profile-properties-update-api for more information on the user profile bulk API.

Re-Index of content

The application will iterate all site collections and automate the task of clicking the Reindex site button per site collection. Triggering re-indexing of all content can have an adverse effect on index latency and should only be performed if your search schema changes affect all content.

If the content affected exist on a list or library only, then trigger re-indexing on that list/library only. If the affected content exist on a site or site collection, then re-index that site/site collection only. Re-index the root site of a site collection will also process all sub-sites.

See support.office.com for more information on how to re-index content in lists, libraries and sites.

Terms and disclaimer

SharePoint Online Search Toolbox and Search Toolbox Helper Service works within the available API's provided by Microsoft. No information about your tenant, users or data is stored or used in any way outside the application itself. The only information recorded is the status of the last re-index operation performed. The application itself and the helper service requires broad permissions, but we guarantee no inappropriate actions are taken regarding your tenant or information.

We also offer companies an option to purchase the application code and install it themselves to avoid any security concerns. Contact sales@puzzlepart.com for more information.