Publishers are blocking the Internet Archive in a bid to outsmart AI scrapers, according to various major publications including The Guardian and The New York Times. These companies are concerned that their content is being used by AI companies' bots to indirectly scrape articles from the internet archive's collections.
In essence, the internet archive serves as a valuable resource for publishers, with its vast collections of records, academic texts, and other materials. However, with the advent of AI technology, these companies have started blocking access to their content, citing concerns that AI businesses are using the Internet Archive's API to scrape their articles.
The New York Times has taken steps to block a bot from accessing its content via the Wayback Machine, which provides unfettered access to the newspaper's archives without authorization. Similarly, The Financial Times and social forum Reddit have also made moves to selectively block how the Internet Archive catalogs their material.
These publishers have attempted to sue AI businesses for using their content in large language models, including OpenAI and Microsoft. While some media outlets have sought financial deals with AI companies, these arrangements seem to provide compensation to publishing companies rather than writers.
The issue extends beyond journalism, as creative fields such as fiction writers, visual artists, and musicians are also fighting against AI tools for copyright and piracy issues.
In essence, the internet archive serves as a valuable resource for publishers, with its vast collections of records, academic texts, and other materials. However, with the advent of AI technology, these companies have started blocking access to their content, citing concerns that AI businesses are using the Internet Archive's API to scrape their articles.
The New York Times has taken steps to block a bot from accessing its content via the Wayback Machine, which provides unfettered access to the newspaper's archives without authorization. Similarly, The Financial Times and social forum Reddit have also made moves to selectively block how the Internet Archive catalogs their material.
These publishers have attempted to sue AI businesses for using their content in large language models, including OpenAI and Microsoft. While some media outlets have sought financial deals with AI companies, these arrangements seem to provide compensation to publishing companies rather than writers.
The issue extends beyond journalism, as creative fields such as fiction writers, visual artists, and musicians are also fighting against AI tools for copyright and piracy issues.