Bot Detection
From Contao Community Documentation
I'm not a native English speaker. Please correct my mistakes.
Incomplete Article: This article is not finished yet and needs expansion.
Please expand it and remove this notice when it is finished. |
(Detection of Search Engines, Spider, Crawler, Bots, Harvester, ...)
Extension-Overview | |
---|---|
Name of the developer | Glen Langer (BugBuster) |
Developer Website | http://www.contao.glen-langer.de |
Version of the extension | 1.0.6 |
Compatibility with Contao Version | from 2.9 |
Compatibility with TYPOlight Version | 2.8 |
Link to Extension Repository | http://www.contao.org/extension-list/view/botdetection.en.html |
Donate the developer | Cappuccino |
Link to Tracker | http://dev.typolight-forge.org/projects/botdetection/issues |
Contents
Forum
Questions about the Bot Detection module will be answered in Forum
Errors and requests can be reported in the Issue Tracker.
Installation
The installation of the module occurs about the extension Repository in back end of Contao.
A manual installation is possible. Download the ZIP file from Extension Repository, unzip and transfer it.
A directory should have been created "/system/modules/botdetection".
Then call /contao/install.php - Perform Update Database.
( /typolight/install.php in older TYPOlight installations )
Using
The module Bot Detection provides three methods for detection.
A reliable detection is not possible.
It is to use two methods to detect this:
- User Agent
- IP adress
The modul includes a method for the agent BD_CheckBotAgent and one for the IP detection BD_CheckBotIP.
These two methods return only "true" or "false" and are only a rough search on strings and substrings to identify the most important bots.
Another method BD_CheckBotAgentAdvanced comes with an external configuration file for the user agent detection. As a result, it returns the short name of the bots or "false".
Method BD_CheckBotAgent
The method BD_CheckBotAgent "searches in two steps to be completed as quickly as possible.
Step 1 searches for substrings that appear in most search engines / bots in the name:
'bot' 'spider' 'spyder' 'crawl' 'slurp' 'robo' 'yahoo'
Step 2 then looks for other strings that usually follow the name of the search engine, such as:
'altavista' 'archiver' 'inktomi' 'twiceler' ...
The result is "true" or "false". ("true" = search engine / bot found)
Method BD_CheckBotIP
The bot from Google, MSN / Bing looking sometimes with the user agent from a browser.
To uncover these "undercover" search engines, you must be filtered by IP address.
There is also a configuration file in the config directory of the module: bot-ip-list.txt
Current content knows an IP address of a spider from Israel as well as network addresses for Google and MSN / Bing.
Additional IP addresses or networks can be entered in this file, but they are not then upgrade secure.
Therefore, it is better to post them in the localconfig.php as follows:
$GLOBALS['BOTDETECTION']['BOT_IP'][] = '192.168.1.2'; $GLOBALS['BOTDETECTION']['BOT_IP'][] = '192.168.0.0/24';
Method BD_CheckBotAgentAdvanced
The method BD_CheckBotAgentAdvanced "is controlled by an external configuration file to detect the user agent.
The result is the short name of the bots or "false".
The external configuration file is generated from known user agent information from search engines / bots and regularly renewed.
Note
This external DB differentiates between different types of search engines from a manufacturer.
For example, the return is not "Google", but "Googlebot" or "Googlebot-Image" or "Googlebot-Mobile" and so on, depending on what was recognized.
These multiple name of a search engine are also available from other producers such as MSN, Yahoo, and so on.
Own or unknown user agent identifiers can be entered in the file /system/config/localconfig.php:
$GLOBALS['BOTDETECTION']['BOT_AGENT'][] = array("unitbot","UniBot from FHTW"); $GLOBALS['BOTDETECTION']['BOT_AGENT'][] = array("myprivat","My privat bot");
The parameters are: short name in lower case, description.
Demo Module
Dem Modul Bot Detection sind 2 Demos beigefügt. Die Einbindung in die Demo Klasse erfolgt per Import.
$this->import('ModuleBotDetection');
Frontend Demo 1
Demo 1 testet mit allen 3 Methoden die aktuelle IP und User Agent Kennung und zeigt die Ergebnisse an.
Beispiel siehe auf der Entwickler Webseite - Demo 1.
Frontend Demo 2
Demo 2 stellt ein Formular zur Verfügung, um zu prüfen, ob eine User Agent Kennung vom Modul als Bot erkannt werden würde.
Dazu werden die beiden Agent Methoden aufgerufen und das Ergebnis angezeigt.
Beispiel siehe auf der Entwickler Webseite - Demo 2.