2017.10.25 Up

Posted by Nao Tokui

アフリカの野生動物の観測にDeep Learningを利用 – Automatically identifying wild animals in camera-trap images with deep learning

アフリカの野生動物の観測にDeep Learningを利用 – Automatically identifying wild animals in camera-trap images with deep learning




学習用のデータとしては、タンザニアのセレンゲティ国立公園に設置されたカメラトラップで撮影された320万枚の写真を集めたSnapshot Serengeti Datasetが使われています.  (クラウドソースで個別の動物のラベル付けがなされています)


詳細は省きますが、Deep Learningの導入によって、320万枚の写真を人手で識別するのにかかる時間に比べて、同等の制度を保ちつつ1700時間以上の時間を短縮できることがわかったとされてます!!  今後、Deep Learningがさまざまな研究領域の調査により広く使われていることでしょう。



Having accurate, detailed, and up-to-date information about wildlife location and behavior across broad geographic areas would revo-lutionize our ability to study, conserve, and manage species and ecosystems. Currently, such data are mostly gathered manually at great expense, and thus are sparsely and infrequently collected. Here we investigate the ability to automatically, accurately, and inex-pensively collect such data, which could transform many fields of biology, ecology, and zoology into ” big data ” sciences. Motion sen-sor cameras called ” camera traps ” enable pictures of wildlife to be collected inexpensively, unobtrusively, and at high-volume. However, identifying the animals, animal attributes, and behaviors in these pic-tures remains an expensive, time-consuming, manual task often per-formed by researchers, hired technicians, or crowdsourced teams of human volunteers. In this paper, we demonstrate that such data can be automatically extracted by deep neural networks (aka deep learning), which is a cutting-edge type of artificial intelligence. In particular, we use the existing human-labeled, single-animal images from the Snapshot Serengeti dataset to train deep convolutional neu-ral networks for identifying 48 species in 3.2 million images taken from Tanzania’s Serengeti National Park. In this paper we train neural networks that automatically identify animals with over 92% accuracy, and we expect that number to improve rapidly in years to come. More importantly, we can choose to have our system classify only the im-ages it is highly confident about, allowing valuable human time to be focused only on challenging images. In this case, our system can automate animal identification for 96.9% of the data while still per-forming at the same 96.6% accuracy level of crowdsourced teams of human volunteers, saving approximately ∼8.2 years (at 40 hours per week) of human labeling effort (i.e. over 17,000 hours) on a 3.2-million-image dataset. Those efficiency gains immediately highlight the importance of using deep neural networks to automate data ex-traction from camera-trap images. The improvements in accuracy we expect in years to come suggest that this technology could enable the inexpensive, unobtrusive, high-volume and perhaps even real-time collection of information about vast numbers of animals in the wild.