微軟刪除全球最大的人臉識別數據庫

本文已影響 3.18W人

Until April, Microsoft boasted of having the largest collection of faces that anyone could use to train facial-recognition algorithms. Since then, the once publicly-available dataset has quietly disappeared.

直到四月，微軟都吹噓擁有最大的人臉數據庫，任何人都可以使用它來訓練面部識別算法。而那之後，曾經公開可用的數據集已經悄然消失。

As the Financial Times reports, Microsoft quietly deleted the dataset after the paper called attention to privacy and ethical issues, including use of the dataset by military researcherss.

正如英國《金融時報》報道的那樣，在該報引發了關於隱私和道德問題的關注之後（包括軍事研究人員和中國監管公司使用數據集），微軟悄然刪除了數據集。

Microsoft did not immediately respond to a request for comment from Fortune. But it told the Financial Times: “The site was intended for academic purposes. It was run by an employee that is no longer with Microsoft and has since been removed.”

微軟沒有立即回覆《財富》雜誌的評論請求。但它告訴英國《金融時報》：“該網站是爲了學術目的設立的。它由一名不再受僱於微軟的員工運營，並且已經被刪除。”

（圖片來源：視覺中國）

The now-deleted dataset contained more than 10 million faces culled from websites like Flickr, which host photographs uploaded under a Creative Commons license—meaning many can be used free of copyright concerns.

現已刪除的數據集中包含超過1000萬張面孔，這些面孔來自Flickr等網站，這些網站儲存的是根據知識共享許可上傳的照片——這意味着許多都可以免費，但可能有版權問題。

The name of the Microsoft dataset, MS Celeb, was chosen because many of the images it contains are famous people who live public lives. Many of the other faces in the set, however, belong to people who are not celebrities—including journalists and privacy researchers—and who were not aware their images had been included.

這個微軟的數據集叫MS Celeb，之所以選擇這個名稱，是因爲它包含的許多圖像都是過着公開生活的名人。然而，該集中的許多其他面孔屬於不是名人的人——包括記者和隱私研究人員——並且他們不知道他們的圖像被包括在內。

Microsoft is hardly the only company to assemble large datasets by scraping photos from the open Internet. In January, IBM announced it was sharing a collection of 1 million faces in the name of promoting more diversity in artificial intelligence. Meanwhile, a website called Megapixels identifies several other massive collections as part of a bid to halt what it describes as a “growing crisis of authoritarian biometric surveillance.”

微軟並不是唯一一家通過從開放的互聯網上抓取照片來組裝大型數據集的公司。今年1月，IBM宣佈它正在以促進人工智能更多樣化的名義共享100萬張面孔。與此同時，一個名爲Megapixels的網站確定了另外幾個大型集合，以此來阻止它所謂的“威脅性的生物識別監視危機”。

While many of the facial recognition sets are culled from public websites like Flickr, that is not the only way companies obtain pictures of faces. As a recent Fortune investigation revealed, startups have been using photo collection apps to surreptitiously collect millions of faces, while other companies have been scanning public collections of mug shots.

雖然像Flickr這樣的公共網站很多都剔除了面部識別裝置，但這並不是公司獲取面部圖片的唯一方式。最近《財富》調查顯示，創業公司一直在使用照片收集應用程序暗中收集數百萬張面孔，而其他公司則一直在掃描大量的大頭照。

翻譯：能貓

微軟刪除全球最大的人臉識別數據庫

猜你喜歡

熱點閱讀

最新文章