Migrating from legacy
Introduction
With this new version of the DocSearch UI, we wanted to go further and provide better tooling for you to create and maintain your config file, and some extra Algolia features that you all have been requesting for a long time!
What's new?
Scraper
The DocSearch infrastructure now leverages the Algolia Crawler. We've teamed up with our friends and created a new DocSearch helper, that extracts records as we were previously doing with our beloved DocSearch scraper!
The best part, is that you no longer need to install any tooling on your side if you want to maintain or update your index!
We now provide a web interface that will allow you to:
- Start, schedule and monitor your crawls
- Edit your config file from our live editor
- Test your results directly with DocSearch v3
Algolia application and credentials
We've received a lot of requests asking for:
- A way to manage team members
- Browse and see how Algolia records are indexed
- See and subscribe to other Algolia features
They are now all available, in your own Algolia application, for free :D
Config file key mapping
Below are the keys that can be found in the legacy
DocSearch configs and their translation to an Algolia Crawler config. More detailed documentation of the Algolia Crawler can be found on the the official documentation
legacy | current | description |
---|---|---|
start_urls | startUrls | Now accepts URLs only, see helpers.docsearch to handle custom variables |
page_rank | pageRank | Can be added to the recordProps in helpers.docsearch , should be passed as a string |
js_render | renderJavaScript | Unchanged |
js_wait | renderJavascript.waitTime | See documentation of renderJavaScript |
index_name | removed, see actions | Handled directly in the actions |
sitemap_urls | sitemaps | Unchanged |
stop_urls | exclusionPatterns | Supports micromatch |
selectors_exclude | removed | Should be handled in the recordExtractor and helpers.docsearch |
custom_settings | initialIndexSettings | Unchanged |
scrape_start_urls | removed | Can be handled with exclusionPatterns |
strip_chars | removed | # are removed automatically from anchor links, edge cases should be handled in the recordExtractor and helpers.docsearch |
conversation_id | removed | Not needed anymore |
nb_hits | removed | Not needed anymore |
sitemap_alternate_links | removed | Not needed anymore |
stop_content | removed | Should be handled in the recordExtractor and helpers.docsearch |
FAQ
Migration seems to have started, but I don't have received any emails
Due to the large number of indices DocSearch has, we need to migrate configs in small incremental batches.
If you have not received a migration mail yet, don't worry, your turn will come!
What do I need to do to migrate?
We've tried to make the migration as seamless as possible for you and took care of all the pain part:
- Your existing config file will be migrated to an Algolia Crawler config
- Crawls will be started and scheduled
- Your Algolia application will be ready to go with a populated index!
All you need to do is update your frontend integration with the credentials you'll receive by email like below:
- JavaScript
- React
docsearch({
container: '#docsearch',
appId: 'YOUR_NEW_ALGOLIA_APP_ID',
apiKey: 'YOUR_NEW_ALGOLIA_SEARCH_API_KEY',
indexName: 'YOUR_INDEX_NAME', // it does not change
});
<DocSearch
appId="YOUR_NEW_ALGOLIA_APP_ID"
apiKey="YOUR_NEW_ALGOLIA_SEARCH_API_KEY"
indexName="YOUR_INDEX_NAME" // it does not change
/>
Why is the API key different in the dashboard?
Algolia apps come with a default search API key, which also allow you to list indices, settings and search on every indices of your app. In the email, we provide a search ONLY API key, scoped to your production index, so you don't have to worry disclosing it in the frontend.
What should I do with my legacy config and credentials?
Your legacy config will be parsed to a Crawler config, please use the dedicated web interface to make any changes if you already received your access!
Your credentials will remain available, but once all the existing configs have been migrated, we will stop the daily crawl jobs.
Why do I see two Algolia apps in my dashboard?
We did not remove access to the legacy DocSearch application (BH4D9OD16A
) to give you the time to get familiar with our new infrastructure. BH4D9OD16A
will remain available until the migration has been completed for all the DocSearch users.
Please only refer to your new Algolia application if you already have access
Are the docsearch-scraper
and docsearch-configs
repository still maintained?
At the time you are reading this, the migration hasn't been completed, so yes they are still maintained.
Once the migration has been completed:
- The
docsearch-scraper
will be archived and not maintained in favor of our Algolia Crawler, you'll still be able to use our run your own solution if you want! - The
docsearch-configs
repository will be archived and and host all of the existing and active legacy DocSearch config file, and their parsed version. You can get a preview on this branch.
I just applied, can I join the new infra?
We are still at an early stage of the migration, so our focus is on live configuration first.
We plan to also include new indices (< 30 days activity) to the migration batch during December 2021.