Amazon’s cloud division has launched an investigation into Perplexity AI, a search startup, to determine if it is violating Amazon Web Services rules by scraping websites that have explicitly forbidden such actions. Perplexity, backed by prominent entities like the Jeff Bezos family fund and Nvidia, and valued at $3 billion, has come under scrutiny for its questionable practices involving scraped content from websites that have enforced regulations through the Robots Exclusion Protocol.
Ethical Concerns
The use of a plaintext file like robots.txt to indicate which pages should not be accessed by automated bots and crawlers is a common practice in the digital realm. While not legally binding, the Robots Exclusion Protocol is respected by most companies that use scrapers. Companies using AWS services are required to adhere to this protocol while crawling websites, as stated in the terms of service. Violating such regulations can amount to illegal activity, which is strictly prohibited by AWS.
Forbes published a report on June 11 accusing Perplexity of stealing one of its articles. Subsequent investigations by WIRED unveiled a pattern of scraping abuse and plagiarism involving systems connected to Perplexity’s AI-powered search chatbot. Despite attempts to block Perplexity’s crawler using robots.txt files across various websites, including those owned by Condé Nast, the startup managed to gain unauthorized access through an undisclosed IP address.
Perplexity’s activities include widespread crawling of news websites that explicitly prohibit bots from accessing their content. The startup’s server, traced back to an Elastic Compute Cloud (EC2) instance hosted on AWS, has been detected visiting multiple websites, including those of The Guardian, Forbes, and The New York Times, without permission. This unauthorized scraping raises serious ethical concerns about data privacy and intellectual property rights.
Response from Perplexity CEO
In response to WIRED’s investigation, Perplexity CEO Aravind Srinivas initially dismissed the allegations by claiming a misunderstanding of the company’s practices. However, further inquiries revealed that the IP address observed scraping websites belonged to a third-party company engaged in web crawling and indexing services. Despite citing a nondisclosure agreement, Srinivas failed to provide specific details about the third-party entity involved in these questionable activities.
The investigation into Perplexity AI highlights the importance of ethical standards in the development and implementation of artificial intelligence technologies. Scrutiny of practices involving data scraping, plagiarism, and unauthorized access to websites serves as a reminder of the legal and ethical responsibilities that companies, especially those operating in the tech industry, must uphold. It is imperative for organizations like Amazon Web Services to enforce strict guidelines to prevent misconduct and protect the integrity of online content.
Leave a Reply
You must be logged in to post a comment.