5000種類以上の動植物の写真を集めたデータセット. 画像認識用のモデルの学習/評価に使われるImageNetが人工物からさまざまな種類の犬まで幅広いカテゴリーを網羅しているのに対して、動植物に絞ったデータセットなので、動植物を識別するようなアプリケーションに役に立ちそうです. ラマとアルパカ、馬とロバのような外見が似ている生き物(下図)を、きちんと識別できるかというKaggleのコンテスト用に用意されたデータセットになります。
Existing image classification datasets used in computer vision tend to have an even number of images for each object category. In contrast, the natural world is heavily imbalanced, as some species are more abundant and easier to photograph than others. To encourage further progress in challenging real world conditions we present the iNaturalist Challenge 2017 dataset – an image classification benchmark consisting of 675,000 images with over 5,000 different species of plants and animals. It features many visually similar species, captured in a wide variety of situations, from all over the world. Images were collected with different camera types, have varying image quality, have been verified by multiple citizen scientists, and feature a large class imbalance.
We discuss the collection of the dataset and present baseline results for state-of-the-art computer vision classification models. Results show that current non-ensemble based methods achieve only 64% top one classification accuracy, illustrating the difficulty of the dataset. Finally, we report results from a competition that was held with the data.