Difference between revisions of "Bot Detection"

From Contao Community Documentation

m (Installation)
(Nutzung)
Line 33: Line 33:
  
 
=Nutzung=
 
=Nutzung=
Das Modul Bot Detection stellt 3 Methoden zur Erkennung bereit.<br />
+
The module Bot Detection provides three methods for detection.<br />
Eine sichere Erkennung gibt es dabei natürlich nicht.<br />
+
A reliable detection is not possible. <br />
Es wird über 2 Verfahren versucht dieses zu erkennen:<br />
+
It is to use two methods to detect this:<br />
* User Agent Kennung
+
* User Agent
* IP Adresse
+
* IP adress
  
Die ersten beiden Methoden, [[Bot_Detection#Methode_BD_CheckBotAgent|BD_CheckBotAgent]] und [[Bot_Detection#Methode_BD_CheckBotIP|BD_CheckBotIP]], geben nur "true" oder "false" zurück und arbeiten mit internen Teilstrings zur Suche bzw. mit einer externen Datei zur Definition der IP Adressen /  Netze.<br /><br />
+
The modul includes a method for the agent [[Bot_Detection#Method_BD_CheckBotAgent|BD_CheckBotAgent]] and one for the IP detection [[Bot_Detection#Method_BD_CheckBotIP|BD_CheckBotIP]].<br />
Eine dritte Methode [[Bot_Detection#Methode_BD_CheckBotAgentAdvanced|BD_CheckBotAgentAdvanced]] kam in Version 1.0.2 hinzu, die über eine externe Konfigurationsdatei die Erkennung über User Agent durchführt und als Ergebnis den Kurznamen des Bots zurückgibt bzw. "false" wenn keine Erkennung erfolgte.
+
These two methods return only "true" or "false" and are only a rough search on strings and substrings to identify the most important bots.<br /><br />
 +
Another method [[Bot_Detection#Method_BD_CheckBotAgentAdvanced|BD_CheckBotAgentAdvanced]] comes with an external configuration file for the user agent detection. As a result, it returns the short name of the bots or "false".<br />
  
==Methode BD_CheckBotAgent==
+
==Method BD_CheckBotAgent==
Die Methode "BD_CheckBotAgent" sucht in 2 Schritten, um möglichst schnell ans Ziel zu kommen.<br />
+
The method BD_CheckBotAgent "searches in two steps to be completed as quickly as possible.<br />
Schritt 1 sucht nach Teilstrings die in den meisten Suchmaschinen / Bots im Namen auftauchen:<br />
+
Step 1 searches for substrings that appear in most search engines / bots in the name:<br />
 
<source lang="text">
 
<source lang="text">
 
'bot'
 
'bot'
Line 55: Line 56:
 
</source>
 
</source>
  
War Schritt 1 nicht erfolgreich sucht dann Schritt 2 nach weiteren Strings die meist der Name der Suchmaschine entsprechen, wie:<br />
+
Step 2 then looks for other strings that usually follow the name of the search engine, such as:<br />
 
<source lang="text">
 
<source lang="text">
 
'altavista'
 
'altavista'
Line 63: Line 64:
 
...
 
...
 
</source>
 
</source>
Als Ergebnis kommt "true" oder "false" zurück ("true" = Suchmaschine / Bot gefunden)
+
The result is "true" or "false". ("true" = search engine / bot found)  
  
==Methode BD_CheckBotIP==
+
==Method BD_CheckBotIP==
Der Bot von Google oder der MSN-Suche bzw. von Bing sind manchmal verdeckt unterwegs, was verfälschte Statistiken zur Folge hat.<br />
+
The bot from Google, MSN / Bing looking sometimes with the user agent from a browser.<br />
Um diese "Undercover" Suchmaschinen aufdecken zu können, muss über die IP-Adresse gefiltert werden.<br />
+
To uncover these "undercover" search engines, you must be filtered by IP address.<br />
Dazu dient die Methode "BD_CheckBotIP".<br />
+
 
<br />
 
<br />
Dazu gibt es eine Konfigurationsdatei im Verzeichnis ''config'' des Moduls: bot-ip-list.txt<br />
+
There is also a configuration file in the ''config'' directory of the module: '''''bot-ip-list.txt''''' <br />
Derzeitiger Inhalt kennt eine IP Adresse eines Spiders aus Israel sowie Netzadressen für Google und MSN/Bing.<br />
+
Current content knows an IP address of a spider from Israel as well as network addresses for Google and MSN / Bing.<br />
<br />
+
Additional IP addresses or networks can be entered in this file, but they are not then upgrade secure.<br />
Eigene IP-Adressen bzw. Netze können in dieser Datei ebenfalls eingetragen werden, diese sind dann aber nicht updatesicher.
+
Therefore, it is better to post them in the localconfig.php as follows:<br />
Daher ist es besser diese, wie dort erwähnt, in der localconfig.php einzutragen wie folgt:<br />
+
 
<source lang="php">
 
<source lang="php">
 
$GLOBALS['BOTDETECTION']['BOT_IP'][] = '192.168.1.2';
 
$GLOBALS['BOTDETECTION']['BOT_IP'][] = '192.168.1.2';
Line 80: Line 79:
 
</source>
 
</source>
  
==Methode BD_CheckBotAgentAdvanced==
+
==Method BD_CheckBotAgentAdvanced==
Die Methode "BD_CheckBotAgentAdvanced" wird von einer externen Konfigurationsdatei gesteuert zur Erkennnug der User Agents.
+
The method BD_CheckBotAgentAdvanced "is controlled by an external configuration file to detect the user agent.<br />
Als Ergebnis folgt der Kurznamen des Bots bzw. "false", wenn keine Erkennung erfolgte.<br />
+
The result is the short name of the bots or "false".<br />
<br />
+
Die externe Konfigurationsdatei wird aus bekannten User Agent Angaben von Suchmaschinen / Bots generiert und regelmäßig erneuert.<br />
+
 
<br />
 
<br />
'''Hinweis'''<br />
+
The external configuration file is generated from known user agent information from search engines / bots and regularly renewed.<br /><br />
Diese externe DB unterscheidet auch zwischen den verschiedenen Arten von Suchmaschinen eines Herstellers.<br />
+
'''Note'''
D.h., es folgt nicht einfach die Rückgabe von beispielsweise "Google" sondern "Googlebot" oder "Googlebot-Image" oder "Googlebot-Mobile" usw. je nachdem was erkannt wurde.
+
This external DB differentiates between different types of search engines from a manufacturer.<br />
Diese mehrfachen Namen einer Suchmaschine gibt es auch bei anderen Herstellern wie MSN, Yahoo, um nur einige zu nennen.
+
For example, the return is not "Google", but "Googlebot" or "Googlebot-Image" or "Googlebot-Mobile" and so on, depending on what was recognized.<br />
 +
These multiple name of a search engine are also available from other producers such as MSN, Yahoo, and so on.
 
<br /><br />
 
<br /><br />
Eigene bzw. unbekannte User Agent Kennungen können in der Datei /system/config/localconfig.php eingetragen werden:<br />
+
Own or unknown user agent identifiers can be entered in the file '''/system/config/localconfig.php''':<br />
 
<source lang="php">
 
<source lang="php">
 
$GLOBALS['BOTDETECTION']['BOT_AGENT'][] = array("unitbot","UniBot from FHTW");
 
$GLOBALS['BOTDETECTION']['BOT_AGENT'][] = array("unitbot","UniBot from FHTW");
 
$GLOBALS['BOTDETECTION']['BOT_AGENT'][] = array("myprivat","My privat bot");
 
$GLOBALS['BOTDETECTION']['BOT_AGENT'][] = array("myprivat","My privat bot");
 
</source>
 
</source>
Die Parameter sind: Kurzname in Kleinbuchstaben, Beschreibung.
+
The parameters are: short name in lower case, description.
  
 
=Demo Module=
 
=Demo Module=

Revision as of 19:25, 12 September 2010

I'm not a native English speaker. Please correct my mistakes.

Stub.png Incomplete Article: This article is not finished yet and needs expansion.

Please expand it and remove this notice when it is finished.

No Bots!
Bot Detection is a helper class for other extensions (Frontend) the need to detect whether the access is human or machine.

(Detection of Search Engines, Spider, Crawler, Bots, Harvester, ...)

Extension-Overview
Name of the developer Glen Langer (BugBuster)
Developer Website http://www.contao.glen-langer.de
Version of the extension 1.0.6
Compatibility with Contao Version from 2.9
Compatibility with TYPOlight Version 2.8
Link to Extension Repository http://www.contao.org/extension-list/view/botdetection.en.html
Donate the developer Cappuccino
Link to Tracker http://dev.typolight-forge.org/projects/botdetection/issues


Hint.png Hint: Translation follows


Forum

Questions about the Bot Detection module will be answered in Forum
Errors and requests can be reported in the Issue Tracker.

Installation

The installation of the module occurs about the extension Repository in back end of Contao.
A manual installation is possible. Download the ZIP file from Extension Repository, unzip and transfer it.
A directory should have been created "/system/modules/botdetection".
Then call /contao/install.php - Perform Update Database.
( /typolight/install.php in older TYPOlight installations )

Nutzung

The module Bot Detection provides three methods for detection.
A reliable detection is not possible.
It is to use two methods to detect this:

  • User Agent
  • IP adress

The modul includes a method for the agent BD_CheckBotAgent and one for the IP detection BD_CheckBotIP.
These two methods return only "true" or "false" and are only a rough search on strings and substrings to identify the most important bots.

Another method BD_CheckBotAgentAdvanced comes with an external configuration file for the user agent detection. As a result, it returns the short name of the bots or "false".

Method BD_CheckBotAgent

The method BD_CheckBotAgent "searches in two steps to be completed as quickly as possible.
Step 1 searches for substrings that appear in most search engines / bots in the name:

'bot'
'spider'
'spyder'
'crawl'
'slurp'
'robo'
'yahoo'

Step 2 then looks for other strings that usually follow the name of the search engine, such as:

'altavista'
'archiver'
'inktomi'
'twiceler'
...

The result is "true" or "false". ("true" = search engine / bot found)

Method BD_CheckBotIP

The bot from Google, MSN / Bing looking sometimes with the user agent from a browser.
To uncover these "undercover" search engines, you must be filtered by IP address.

There is also a configuration file in the config directory of the module: bot-ip-list.txt
Current content knows an IP address of a spider from Israel as well as network addresses for Google and MSN / Bing.
Additional IP addresses or networks can be entered in this file, but they are not then upgrade secure.
Therefore, it is better to post them in the localconfig.php as follows:

$GLOBALS['BOTDETECTION']['BOT_IP'][] = '192.168.1.2';
$GLOBALS['BOTDETECTION']['BOT_IP'][] = '192.168.0.0/24';

Method BD_CheckBotAgentAdvanced

The method BD_CheckBotAgentAdvanced "is controlled by an external configuration file to detect the user agent.
The result is the short name of the bots or "false".

The external configuration file is generated from known user agent information from search engines / bots and regularly renewed.

Note This external DB differentiates between different types of search engines from a manufacturer.
For example, the return is not "Google", but "Googlebot" or "Googlebot-Image" or "Googlebot-Mobile" and so on, depending on what was recognized.
These multiple name of a search engine are also available from other producers such as MSN, Yahoo, and so on.

Own or unknown user agent identifiers can be entered in the file /system/config/localconfig.php:

$GLOBALS['BOTDETECTION']['BOT_AGENT'][] = array("unitbot","UniBot from FHTW");
$GLOBALS['BOTDETECTION']['BOT_AGENT'][] = array("myprivat","My privat bot");

The parameters are: short name in lower case, description.

Demo Module

Dem Modul Bot Detection sind 2 Demos beigefügt. Die Einbindung in die Demo Klasse erfolgt per Import.

$this->import('ModuleBotDetection');

Frontend Demo 1

Demo 1 testet mit allen 3 Methoden die aktuelle IP und User Agent Kennung und zeigt die Ergebnisse an.
Beispiel siehe auf der Entwickler Webseite - Demo 1.

Frontend Demo 2

Demo 2 stellt ein Formular zur Verfügung, um zu prüfen, ob eine User Agent Kennung vom Modul als Bot erkannt werden würde.
Dazu werden die beiden Agent Methoden aufgerufen und das Ergebnis angezeigt.
Beispiel siehe auf der Entwickler Webseite - Demo 2.

Views
Personal tools

Contao Community Documentation

In other languages
Navigation
Discover
Understand
Enhance
Miscellaneous
Tools