左右不逢缘 发表于 前天 21:49

蜘蛛虽然有些支持jS,但是奉劝小白不要买这个针对于静态生成的


蜘蛛虽然有些支持jS,但是奉劝小白不要买这个针对于静态生成的。总结动态条件下,相对会好好很多的哈,静态的用JS基本大部分蜘蛛没有办法完全做到的。比如帝国CMS动态的,自定义函数在调用就完成了。相对准确性高几十倍
//蜘蛛爬行统计function recordSpiderLog($tape) {global $empire,$dbtbpre,$class_r,$public_r,$memcache;$time = time();$ip = egetip();$useragent = strtolower($_SERVER['HTTP_USER_AGENT'] ?? '');if($tape=="pc"){$url = RepPostStr(rtrim($public_r['add_pcurl'],'/').$_SERVER['REQUEST_URI']);      }else{$url = RepPostStr(rtrim($public_r['add_murl'],'/').$_SERVER['REQUEST_URI']);      }$dupResult = memcacheRateLimit('spider_dup_', 15, 60, $url); //内存级限流自定义函数(1分钟请求15次)if (!$dupResult['allow']) {return false;}      $stats_setr = fetch_memcache0("select openstats,spiderstats,spiderkeywords from {$dbtbpre}ecmsextend_stats_set limit 1",'Memcached','ecmsextendspider',3600*24*1);if (empty($stats_setr['spiderstats'])) {return false; }$spiderkeywords = $stats_setr['spiderkeywords'] ?? '';$allBots = [];if (!empty($spiderkeywords)) {$lines = explode("\n", $spiderkeywords);foreach ($lines as $line) {$line = trim($line);if (empty($line) || strpos($line, '#') === 0) continue; // 跳过注释/空行$parts = explode('=', $line, 2);if (count($parts) === 2) {$botKeyword = strtolower(trim($parts));$botName = trim($parts);$allBots[$botKeyword] = $botName;}}}$bot = '其它';foreach ($allBots as $keyword => $botname) {if (strpos($useragent, $keyword) !== false) {$bot = $botname;break;}}$targetBots = ['百度', '360', '搜狗', '头条', '神马', '必应'];$isTargetBot = false;foreach ($targetBots as $target) {if (strpos($bot, $target) === 0) {$isTargetBot = true;break;}}if (!$isTargetBot) {return false; }$useragent =RepPostStr($useragent);$result = $empire->query("insert into {$dbtbpre}robots(id,robotsname,robotspage,oldurl,robotsip,riqi) values(null,'$bot','$useragent','$url','$ip','$time')");}比这个用函数封装的JS准确率高N倍了
页: [1]
查看完整版本: 蜘蛛虽然有些支持jS,但是奉劝小白不要买这个针对于静态生成的