Today a corporation responded to a twitter grip over obeying robots.txt
Evidently the bot used by the corporation is not theirs. their bot has a different agent name. The corporation provided the full agent name so it would be easy to compare using ARP etc from the command line.
I responded inquiring over the fake bot personating the business.
The Users Online toll is able to spot bots fairly efficiently and I immediately updated the source code with the new agent and marked the old one as a fake bot.
Most crawlers I see are legit, some SEO services scan other site to see how their better funded client compares. WordPress plugins ad nauseum offer misc services.
Yoast makes a duplicate database which tends to overrun storage. JetPack is so over bloated as to be flagrantly net negative. Google’s plugin includes page speed which suggests faster page loading times are important.
Hardcore Games attracts search endemically as a content rich entity. Google, Bing and Yandex seems to be the most common while Huawei crawls heavily with their Petal bot. The Czech bot serves a small market but traffic is welcome. Petal serves a huge market in China which is still underrepresented in gaming.
So far one fake has been proven and a few malicious users are now blocked. Most likely many more IP addresses will be blocked as time progressed.
SpamBots are abundant. Many forums over the years have been trashed by spambots which fill the database with links to undesirable sites etc. Google already page ranks the spambot links to zero so they are useless. phpBB has struggled against spambot for years. Early procedural CAPTCHA solutions were all defeated. Question and answer became the viable solution. It was very eye opening looking at the experience that phoBB had to endure,