Difference between revisions of "Bot Detection"

From Contao Community Documentation

m (Method BD_CheckBotIP)
m
 
(24 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
'''''I'm not a native English speaker. Please correct my mistakes.'''''
 
'''''I'm not a native English speaker. Please correct my mistakes.'''''
{{stub}}
+
<br /><br />
 
[[File:Bot_detection.jpg|right|No Bots!]]Bot Detection is a helper class for other extensions (Frontend) the need to detect whether the access is human or machine.<br />
 
[[File:Bot_detection.jpg|right|No Bots!]]Bot Detection is a helper class for other extensions (Frontend) the need to detect whether the access is human or machine.<br />
 
(Detection of Search Engines, Spider, Crawler, Bots, Harvester, ...)  
 
(Detection of Search Engines, Spider, Crawler, Bots, Harvester, ...)  
Line 6: Line 6:
 
| Dev=Glen Langer ([[User:BugBuster|BugBuster]])
 
| Dev=Glen Langer ([[User:BugBuster|BugBuster]])
 
| DevSite=http://www.contao.glen-langer.de
 
| DevSite=http://www.contao.glen-langer.de
| ExtVersion=1.0.6
+
| ExtVersion=1.7.2 / 3.4.0
 
| Version=from 2.9
 
| Version=from 2.9
 
| TLVersion=2.8
 
| TLVersion=2.8
 
| ERLink=http://www.contao.org/extension-list/view/botdetection.en.html
 
| ERLink=http://www.contao.org/extension-list/view/botdetection.en.html
| TrackerLink=http://dev.typolight-forge.org/projects/botdetection/issues
+
| TrackerLink=https://github.com/BugBuster1701/botdetection/issues
 
| DonateLink=Cappuccino
 
| DonateLink=Cappuccino
 +
| Comment=Now on GitHub
 
}}
 
}}
  
 
[[de:Bot_Detection]]
 
[[de:Bot_Detection]]
 
[[Category:Extensions]]
 
[[Category:Extensions]]
 
{{Hint
 
|Translation follows
 
}}
 
  
 
=Forum=
 
=Forum=
 
Questions about the Bot Detection module will be answered in [http://www.contao-community.org/viewforum.php?f=21 Forum]<br />
 
Questions about the Bot Detection module will be answered in [http://www.contao-community.org/viewforum.php?f=21 Forum]<br />
Errors and requests can be reported in the [http://dev.typolight-forge.org/projects/show/botdetection Issue Tracker].
+
Errors and requests can be reported in the [https://github.com/BugBuster1701/botdetection/issues Issue Tracker].
 +
 
 +
=Translations=
 +
Translations can be performed on [https://www.transifex.com/projects/p/botdetection/ Transifex]
  
 
=Installation=
 
=Installation=
Line 30: Line 30:
 
A directory should have been created "/system/modules/botdetection".<br />
 
A directory should have been created "/system/modules/botdetection".<br />
 
Then call /contao/install.php  - Perform Update Database. <br />
 
Then call /contao/install.php  - Perform Update Database. <br />
( /typolight/install.php in older TYPOlight installations )<br /><br />
+
( /typolight/install.php in older TYPOlight installations )
  
 
=Using=
 
=Using=
Line 37: Line 37:
 
It is to use two methods to detect this:<br />
 
It is to use two methods to detect this:<br />
 
* User Agent
 
* User Agent
* IP adress
+
* IP adress (IPv4, IPv6 from version 1.4.0)
  
 
The modul includes a method for the agent [[Bot_Detection#Method_BD_CheckBotAgent|BD_CheckBotAgent]] and one for the IP detection [[Bot_Detection#Method_BD_CheckBotIP|BD_CheckBotIP]].<br />
 
The modul includes a method for the agent [[Bot_Detection#Method_BD_CheckBotAgent|BD_CheckBotAgent]] and one for the IP detection [[Bot_Detection#Method_BD_CheckBotIP|BD_CheckBotIP]].<br />
 
These two methods return only "true" or "false" and are only a rough search on strings and substrings to identify the most important bots.<br /><br />
 
These two methods return only "true" or "false" and are only a rough search on strings and substrings to identify the most important bots.<br /><br />
Another method [[Bot_Detection#Method_BD_CheckBotAgentAdvanced|BD_CheckBotAgentAdvanced]] comes with an external configuration file for the user agent detection. As a result, it returns the short name of the bots or "false".<br />
+
Another method [[Bot_Detection#Method_BD_CheckBotAgentAdvanced|BD_CheckBotAgentAdvanced]] comes with an external configuration file for the user agent detection. As a result, it returns the short name of the bots or "false".
  
 
==Method BD_CheckBotAgent==
 
==Method BD_CheckBotAgent==
Line 64: Line 64:
 
...
 
...
 
</source>
 
</source>
The result is "true" or "false". ("true" = search engine / bot found)  
+
The result is "true" or "false". ("true" = search engine / bot found)
  
 
==Method BD_CheckBotIP==
 
==Method BD_CheckBotIP==
Line 70: Line 70:
 
To uncover these "undercover" search engines, you must be filtered by IP address.<br />
 
To uncover these "undercover" search engines, you must be filtered by IP address.<br />
 
<br />
 
<br />
There is also a configuration file in the ''config'' directory of the module: '''''bot-ip-list.txt''''' <br />
+
There is also configuration files in the ''config'' directory of the module: '''''bot-ip-list.txt''''' and '''''bot-ip-list-ipv6.txt'''''<br />
 
Current content knows an IP address of a spider from Israel as well as network addresses for Google and MSN / Bing.<br />
 
Current content knows an IP address of a spider from Israel as well as network addresses for Google and MSN / Bing.<br />
 
Additional IP addresses or networks can be entered in this file, but they are not then upgrade secure.<br />
 
Additional IP addresses or networks can be entered in this file, but they are not then upgrade secure.<br />
Line 78: Line 78:
 
$GLOBALS['BOTDETECTION']['BOT_IP'][] = '192.168.0.0/24';
 
$GLOBALS['BOTDETECTION']['BOT_IP'][] = '192.168.0.0/24';
 
</source>
 
</source>
 
+
For IPv6 on this way:
<br />
+
<source lang="php">
 +
$GLOBALS['BOTDETECTION']['BOT_IPV6'][] = '2001:0db8::1';
 +
$GLOBALS['BOTDETECTION']['BOT_IPV6'][] = '2001:0db8:85a3:0800::/56';
 +
</source>
  
 
==Method BD_CheckBotAgentAdvanced==
 
==Method BD_CheckBotAgentAdvanced==
Line 86: Line 89:
 
<br />
 
<br />
 
The external configuration file is generated from known user agent information from search engines / bots and regularly renewed.<br /><br />
 
The external configuration file is generated from known user agent information from search engines / bots and regularly renewed.<br /><br />
'''Note'''
+
'''Note'''<br />
 
This external DB differentiates between different types of search engines from a manufacturer.<br />
 
This external DB differentiates between different types of search engines from a manufacturer.<br />
 
For example, the return is not "Google", but "Googlebot" or "Googlebot-Image" or "Googlebot-Mobile" and so on, depending on what was recognized.<br />
 
For example, the return is not "Google", but "Googlebot" or "Googlebot-Image" or "Googlebot-Mobile" and so on, depending on what was recognized.<br />
Line 98: Line 101:
 
The parameters are: short name in lower case, description.
 
The parameters are: short name in lower case, description.
  
=Demo Module=
+
=Demo Modules=
Dem Modul Bot Detection sind 2 Demos beigefügt. Die Einbindung in die Demo Klasse erfolgt per Import.
+
Module Bot Detection has 2 frontend demo modules as a demo for the implementation in your own modules.<br />
 +
The using in the demo class is made by import.
 
<source lang="php">
 
<source lang="php">
 
$this->import('ModuleBotDetection');
 
$this->import('ModuleBotDetection');
Line 105: Line 109:
  
 
==Frontend Demo 1==
 
==Frontend Demo 1==
Demo 1 testet mit allen 3 Methoden die aktuelle IP und User Agent Kennung und zeigt die Ergebnisse an.<br />
+
Demo 1 tests with all three methods, the current IP and User Agent identifier and displays the results.<br />
Beispiel siehe auf der Entwickler Webseite - [http://www.contao.glen-langer.de/BD_Frontend_Demo_1.html Demo 1].
+
See example on the developer website - [http://www.contao.glen-langer.de/BD_Frontend_Demo_1.html Demo 1].
  
 
==Frontend Demo 2==
 
==Frontend Demo 2==
Demo 2 stellt ein Formular zur Verfügung, um zu prüfen, ob eine User Agent Kennung vom Modul als Bot erkannt werden würde.<br />
+
Demo 2 makes available a form to check whether a user agent identifier from the module would be recognized as a bot.<br />
Dazu werden die beiden Agent Methoden aufgerufen und das Ergebnis angezeigt.<br />
+
These are called the two agent methods and displays the result.<br />
Beispiel siehe auf der Entwickler Webseite - [http://www.contao.glen-langer.de/BD_Frontend_Demo_2.html Demo 2].
+
See example on the developer website - [http://www.contao.glen-langer.de/BD_Frontend_Demo_2.html Demo 2].
 
<br /><br />
 
<br /><br />
 +
 +
----
 +
--[[User:BugBuster|BugBuster]] 21:42, 1 July 2011 (CEST)

Latest revision as of 02:39, 17 February 2015

I'm not a native English speaker. Please correct my mistakes.

No Bots!
Bot Detection is a helper class for other extensions (Frontend) the need to detect whether the access is human or machine.

(Detection of Search Engines, Spider, Crawler, Bots, Harvester, ...)

Extension-Overview
Name of the developer Glen Langer (BugBuster)
Developer Website http://www.contao.glen-langer.de
Version of the extension 1.7.2 / 3.4.0
Compatibility with Contao Version from 2.9
Compatibility with TYPOlight Version 2.8
Link to Extension Repository http://www.contao.org/extension-list/view/botdetection.en.html
Donate the developer Cappuccino
Link to Tracker https://github.com/BugBuster1701/botdetection/issues
Comment Now on GitHub

Forum

Questions about the Bot Detection module will be answered in Forum
Errors and requests can be reported in the Issue Tracker.

Translations

Translations can be performed on Transifex

Installation

The installation of the module occurs about the extension Repository in back end of Contao.
A manual installation is possible. Download the ZIP file from Extension Repository, unzip and transfer it.
A directory should have been created "/system/modules/botdetection".
Then call /contao/install.php - Perform Update Database.
( /typolight/install.php in older TYPOlight installations )

Using

The module Bot Detection provides three methods for detection.
A reliable detection is not possible.
It is to use two methods to detect this:

  • User Agent
  • IP adress (IPv4, IPv6 from version 1.4.0)

The modul includes a method for the agent BD_CheckBotAgent and one for the IP detection BD_CheckBotIP.
These two methods return only "true" or "false" and are only a rough search on strings and substrings to identify the most important bots.

Another method BD_CheckBotAgentAdvanced comes with an external configuration file for the user agent detection. As a result, it returns the short name of the bots or "false".

Method BD_CheckBotAgent

The method BD_CheckBotAgent "searches in two steps to be completed as quickly as possible.
Step 1 searches for substrings that appear in most search engines / bots in the name:

'bot'
'spider'
'spyder'
'crawl'
'slurp'
'robo'
'yahoo'

Step 2 then looks for other strings that usually follow the name of the search engine, such as:

'altavista'
'archiver'
'inktomi'
'twiceler'
...

The result is "true" or "false". ("true" = search engine / bot found)

Method BD_CheckBotIP

The bot from Google, MSN / Bing looking sometimes with the user agent from a browser.
To uncover these "undercover" search engines, you must be filtered by IP address.

There is also configuration files in the config directory of the module: bot-ip-list.txt and bot-ip-list-ipv6.txt
Current content knows an IP address of a spider from Israel as well as network addresses for Google and MSN / Bing.
Additional IP addresses or networks can be entered in this file, but they are not then upgrade secure.
Therefore, it is better to post them in the localconfig.php as follows:

$GLOBALS['BOTDETECTION']['BOT_IP'][] = '192.168.1.2';
$GLOBALS['BOTDETECTION']['BOT_IP'][] = '192.168.0.0/24';

For IPv6 on this way:

$GLOBALS['BOTDETECTION']['BOT_IPV6'][] = '2001:0db8::1';
$GLOBALS['BOTDETECTION']['BOT_IPV6'][] = '2001:0db8:85a3:0800::/56';

Method BD_CheckBotAgentAdvanced

The method BD_CheckBotAgentAdvanced "is controlled by an external configuration file to detect the user agent.
The result is the short name of the bots or "false".

The external configuration file is generated from known user agent information from search engines / bots and regularly renewed.

Note
This external DB differentiates between different types of search engines from a manufacturer.
For example, the return is not "Google", but "Googlebot" or "Googlebot-Image" or "Googlebot-Mobile" and so on, depending on what was recognized.
These multiple name of a search engine are also available from other producers such as MSN, Yahoo, and so on.

Own or unknown user agent identifiers can be entered in the file /system/config/localconfig.php:

$GLOBALS['BOTDETECTION']['BOT_AGENT'][] = array("unitbot","UniBot from FHTW");
$GLOBALS['BOTDETECTION']['BOT_AGENT'][] = array("myprivat","My privat bot");

The parameters are: short name in lower case, description.

Demo Modules

Module Bot Detection has 2 frontend demo modules as a demo for the implementation in your own modules.
The using in the demo class is made by import.

$this->import('ModuleBotDetection');

Frontend Demo 1

Demo 1 tests with all three methods, the current IP and User Agent identifier and displays the results.
See example on the developer website - Demo 1.

Frontend Demo 2

Demo 2 makes available a form to check whether a user agent identifier from the module would be recognized as a bot.
These are called the two agent methods and displays the result.
See example on the developer website - Demo 2.


--BugBuster 21:42, 1 July 2011 (CEST)

Views
Personal tools

Contao Community Documentation

In other languages
Navigation
Discover
Understand
Enhance
Miscellaneous
Tools