PDA

View Full Version : How to block Tencent IP ranges/subnets, in order to block site crawling bots?



Fli
10-11-2024, 08:04 PM
The parasite bots from a Chinese Tencent bad business possibly not respected my robots.txt crawl-delay (https://en.wikipedia.org/wiki/Robots.txt#Crawl-delay_directive) since it has been crawling the site more often causing problems.

How to reduce the performance impact of this? How to block these chinese parasite bots?

A) enable Cloudflare and its Under attack mode, which challenges all visitors or a visitors form selected countries or selected subnets with a a captcha.

B) insert the IP ranges of the Tencent parasite company into a .htaccess file like this:


Order Deny,Allow
# Tencent ranges (based on https://www.whatismyip.com/asn/132203/) start:
Deny from 1.12.0.0/20
Deny from 1.12.34.0/23
Deny from 1.201.184.0/22
Deny from 1.201.188.0/23
Deny from 43.128.0.0/15
Deny from 43.130.0.0/16
Deny from 43.131.0.0/21
Deny from 43.131.8.0/23
Deny from 43.131.12.0/22
Deny from 43.131.16.0/20
Deny from 43.131.32.0/19
Deny from 43.131.224.0/19
Deny from 43.132.0.0/18
Deny from 43.132.68.0/24
Deny from 43.132.96.0/19
Deny from 43.132.128.0/17
Deny from 43.133.0.0/16
Deny from 43.134.0.0/16
Deny from 43.135.0.0/17
Deny from 43.135.128.0/18
Deny from 43.135.192.0/19
Deny from 43.152.64.0/20
Deny from 43.152.80.0/21
Deny from 43.152.90.0/23
Deny from 43.152.92.0/22
Deny from 43.152.96.0/20
Deny from 43.152.112.0/22
Deny from 43.152.192.0/18
Deny from 43.153.0.0/16
Deny from 43.154.0.0/15
Deny from 43.156.0.0/15
Deny from 43.158.0.0/17
Deny from 43.158.192.0/18
Deny from 43.159.0.0/18
Deny from 43.159.128.0/17
Deny from 43.160.0.0/17
Deny from 43.160.128.0/19
Deny from 43.160.192.0/18
Deny from 43.161.0.0/16
Deny from 43.162.0.0/15
Deny from 43.167.0.0/16
Deny from 45.40.216.0/21
Deny from 45.113.68.0/22
Deny from 45.146.112.0/23
Deny from 49.51.0.0/19
Deny from 49.51.32.0/20
Deny from 49.51.48.0/21
Deny from 49.51.62.0/23
Deny from 49.51.64.0/19
Deny from 49.51.96.0/21
Deny from 49.51.104.0/22
Deny from 49.51.108.0/23
Deny from 49.51.128.0/18
Deny from 49.51.192.0/19
Deny from 49.51.224.0/23
Deny from 49.51.228.0/22
Deny from 49.51.232.0/21
Deny from 49.51.240.0/20
Deny from 101.32.0.0/16
Deny from 101.33.0.0/23
Deny from 101.33.4.0/23
Deny from 101.33.30.0/23
Deny from 101.33.32.0/21
Deny from 101.33.41.0/24
Deny from 101.33.42.0/23
Deny from 101.33.44.0/22
Deny from 101.33.48.0/20
Deny from 101.33.64.0/18
Deny from 101.33.128.0/18
Deny from 103.7.28.0/22
Deny from 103.52.216.0/22
Deny from 103.238.16.0/23
Deny from 119.28.0.0/16
Deny from 119.29.29.0/24
Deny from 120.53.52.0/23
Deny from 120.88.56.0/23
Deny from 121.4.4.0/22
Deny from 124.156.0.0/19
Deny from 124.156.32.0/21
Deny from 124.156.40.0/24
Deny from 124.156.42.0/23
Deny from 124.156.44.0/22
Deny from 124.156.48.0/20
Deny from 124.156.64.0/18
Deny from 124.156.128.0/17
Deny from 129.226.0.0/16
Deny from 150.109.0.0/16
Deny from 156.240.88.0/22
Deny from 162.14.0.0/19
Deny from 162.14.32.0/21
Deny from 162.14.48.0/20
Deny from 162.62.10.0/23
Deny from 162.62.14.0/23
Deny from 162.62.42.0/23
Deny from 162.62.48.0/20
Deny from 162.62.64.0/20
Deny from 162.62.80.0/21
Deny from 162.62.96.0/19
Deny from 162.62.128.0/23
Deny from 162.62.132.0/22
Deny from 162.62.136.0/21
Deny from 162.62.144.0/20
Deny from 162.62.160.0/21
Deny from 162.62.168.0/22
Deny from 162.62.208.0/20
Deny from 162.62.224.0/20
Deny from 170.106.0.0/16
Deny from 182.254.116.0/24
Deny from 182.254.118.0/24
Deny from 203.205.128.0/23
Deny from 203.205.134.0/23
Deny from 203.205.136.0/21
Deny from 203.205.144.0/22
Deny from 203.205.155.0/24
Deny from 203.205.156.0/23
Deny from 203.205.159.0/24
Deny from 203.205.188.0/24
Deny from 203.205.191.0/24
Deny from 203.205.192.0/21
Deny from 203.205.218.0/23
Deny from 203.205.220.0/22
Deny from 203.205.224.0/24
Deny from 203.205.232.0/21
Deny from 203.205.240.0/24
Deny from 203.205.242.0/24
Deny from 203.205.248.0/21
Deny from 210.171.232.0/21
Deny from 210.180.74.0/23
Deny from 211.56.92.0/22
Deny from 211.152.128.0/22
Deny from 211.152.132.0/23
Deny from 211.152.154.0/23
Deny from 211.152.158.0/23
# Tencent ranges end

This will block their access and result in a 403 Forbidden error instead.

C)

Some of the Tencent bot access log lines have in common this part: "https://google.com". Sample Access log line:

43.153.5.20 - - [10/Oct/2024:15:41:20 +0200] "GET /subpage.php?abc&s=123456 HTTP/1.1" 500 818 "https://google.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"

Maybe it can be used to block matching UserAgents using .htaccess mod rewrite rules:


# 8G:[USER AGENT]
<IfModule mod_rewrite.c>

RewriteCond %{HTTP_USER_AGENT} ([a-z0-9]{2000,}) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (&lt;|%0a|%0d|%27|%3c|%3e|%00|0x00|\\\x22) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (ahrefs|archiver|curl|libwww-perl|pycurl|scan) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (oppo\sa33|(c99|php|web)shell|site((.){0,2})copier ) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (base64_decode|bin/bash|disconnect|eval|unserializ) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (acapbot|acoonbot|alexibot|asterias|attackbot|awar io|backdor|becomebot|binlar|blackwidow|blekkobot|b lex|blowfish|bullseye|bunnys|butterfly|careerbot|c asper) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (checkpriv|cheesebot|cherrypick|chinaclaw|choppy|c lshttp|cmsworld|copernic|copyrightcheck|cosmos|cre scent|datacha|(\b)demon(\b)|diavol|discobot|dittos pyder) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (dotbot|dotnetdotcom|dumbot|econtext|emailcollecto r|emailsiphon|emailwolf|eolasbot|eventures|extract |eyenetie|feedfinder|flaming|flashget|flicky|foobo t|fuck) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (g00g1e|getright|gigabot|go-ahead-got|gozilla|grabnet|grafula|harvest|heritrix|httra cks?|icarus6j|jetbot|jetcar|jikespider|kmccrew|lee chftp|libweb|liebaofast) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (linkscan|linkwalker|loader|lwp-download|majestic|masscan|miner|mechanize|mj12bot| morfeus|moveoverbot|netmechanic|netspider|nicerspr o|nikto|ninja|nominet|nutch) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (octopus|pagegrabber|petalbot|planetwork|postrank| proximic|purebot|queryn|queryseeker|radian6|radiat ion|realdownload|remoteview|rogerbot|scan|scooter| seekerspid) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (semalt|siclab|sindice|sistrix|sitebot|siteexplore r|sitesnagger|skygrid|smartdownload|snoopy|sosospi der|spankbot|spbot|sqlmap|stackrambler|stripper|su cker|surftbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (sux0r|suzukacz|suzuran|takeout|teleport|telesoft| true_robots|turingos|turnit|vampire|vikspider|void eye|webleacher|webreaper|webstripper|webvac|webvie wer|webwhacker) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (winhttp|wwwoffle|woxbot|xaldon|xxxyy|yamanalab|yi oopbot|youda|zeus|zmeu|zune|zyborg) [NC]

RewriteRule .* - [F]

</IfModule>

If anyone knows how to insert that into above mentioned rules and if it is a good idea, please comment.

If I am wrong about this, please kindly reply here so this can be fixed. Thank you