Meorca Web Crawler


A Brief Description of MeorcaBot

How to identify MeorcaBot
We assume you are here because you noticed traffic from a User-Agent that identified itself with the string:

Mozilla/5.0 (compatible; MeorcaBot +

If the IP Address was also, then you have come to the right place to find out about MeorcaBot, If it was a different IP address then someone else is hijacking our web crawler’s name.

Who runs MeorcaBot
MeorcaBot is run by Meorca, a company based in London, United Kingdom.

How MeorcaBot crawls a site
During this Beta Testing phase MeorcaBot is run periodically. Each machine in a crawl has about one fetcher processes. Each fetcher has open at most 100-300 connections at any given time. In a typical situation, these connections would not all be to the same host.

How to change how MeorcaBot crawls your site
The MeorcaBot understand robots.txt (it has to be robots.txt not robot.txt ) files and it also obeys X-Robots-Tag HTTP headers, html meta tag noindex and nofollow, as well as anchor rel=”nofollow” directives. MeorcaBot further understands the Crawl-delay and Google and Bing * and $ syntax within Allow and Disallow line extensions to the robots.txt standard. If you want to restrict MeorcaBot’s access to your site the easiest way is to just add a directive for it to follow in your robots.txt file. For example, in your document root you could put a robots.txt file with lines like:

User-agent: MeorcaBot

Disallow: /some_folder2/

Allow: /some_other_folder/

If you have general robot directives using expressions like “User-Agent: *”, these will be understood by MeorcaBot as well. MeorcaBot caches the robots.txt file for 1 day. Then use the cached directives rather than re-requesting the robots.txt file for 24 hours before making a new request of the robots.txt file again. So if you change your robots.txt file it might take a little while before the changes are noticed by MeorcaBot.

Contact Info
If you have any questions about MeorcaBot contact us.