OpenAI is reportedly scaling up its crawling infrastructure for the holiday shopping season. The folks at Merj noticed OpenAI adding a lot of new IP ranges for its bots and crawlers.
Cloudflare, a company that powers about 20% of all web pages on the web, has announced it is now blocking AI crawlers by default. Plus, it is offering a new model to allow AI services to pay content creators to crawl their content, named Pay Per Crawl.
Joost de Valk, the founder of Yoast, posted some interesting data on Twitter yesterday around crawlers and how much they consume of their site, how active they are and if there is any return on investment. The big one is that Bing crawled ~84…
eMarketer’s recent report on Global e-commerce growth showed online sales globally exceeded $1 trillion in 2012. They further indicate that global e-commerce will grow by an additional 19% in 2013, with the Asia-Pacific region surpassing North America in online sales. This reemphasizes the…
Twitter has unveiled their URL fetcher, which they named SpiderDuck…
Believe it or not, I am not a huge fan of placing robots.txt files on sites unless you want to specifically block content and sections from Google or other search engines. It just always felt redundant to tell a search engine they can crawl your site when they will do so unless you tell them not to.