分享一些恶意爬虫的UA特征(仅供参考,宝塔Nginx防火墙)
使用宝塔面板的可以在Nginx防火墙里面的UA黑名单导入(自己整理,仅供参考)AhrefsBot
AwarioBot
BLEXBot
Barkrowler
CensysInspect
Criteo
DataForSeoBot
DigExt
DnyzBot
DotBot
ExtLinksBot
Ezooms
FlightDeckReports
Go-http-client
Grapeshot
Heritrix
HttpClient
HubSpot
InternetMeasurement
Knowledge AI
Linguee Bot
MJ12bot
MauiBot
MegaIndex
RepoLookoutBot
SemrushBot
SurdotlyBot
Web-Crawler
WellKnownBot
Yellowbrandprotectionbot
ZoominfoBot
axios
fasthttp
github
libcurl
paloaltonetworks
python
seokicks
serpstatbot
webprosbot Knowledge AI
Linguee Bot
把这两个空格后面的去掉,批量导入,会单独变成AI跟Bot,基本上把大部分蜘蛛给屏蔽了。 本帖最后由 sevenfish 于 2024-9-4 16:10 编辑
说的非常有道理,宝塔的UA有个很大的问题,不是完全匹配,而是根据关键词来的。这导致有人不了解的话去屏蔽一个完整的UA,会误伤里面正常的关键词。
以前宝塔官方就承认这个问题,不知道现在改掉没有 还有那个semrush,必须屏蔽 petalbot 这个也挺垃圾的 "Semrush", "Dot", "Ahrefs", "MJ12", "Maui", "MegaIndex.ru", "BLEX",
"Google", "Bing", "Baiduspider", "Yandex", "Sogou", "Sosospider", "360Spider",
"Exa", "Pinterest", "FacebookExternalHit", "ExternalHit", "Twitter", "Alexa",
"Yahoo", "MSN", "Altavista", "Lycos", "Infoseek", "WebCrawler", "WiseNut",
"Giga", "Naver", "Daum", "Yeti/Naver", "YisouSpider", "EasouSpider",
"Pingdom", "Uptime", "Site24x7", "Siteimprove", "Screaming", "Seznam", "Imagesift", "Barkrowler", "ToutiaoSpider", "bytedance", "Blexbot","webmeup"我自己的供参考~~
========
补充一下,不是屏蔽,是用来统计蜘蛛爬行次数的~ 加上 GPTBot 这个Ai爬虫也垃圾啊 感谢分享 感谢分享
页:
[1]
2