附表1
模型和数据
为了方便用户使用,我们收集了深度学习常用的数据集,以及一些常用模型的预训练权重,放在对象存储中,用户可直接使用这些数据开始自己的工作,节省下载数据的时间,提高工作效率。
数据集
ImageNet
名称 | 地址 | URL | 尺寸 |
---|---|---|---|
ILSVRC2017 Object localization dataset | CLS-LOC dataset | https://appcenter-deeplearning.sh1a.qingstor.com/dataset/imagenet/ILSVRC2017_CLS-LOC.tar.gz | 155GB |
ILSVRC2017 Object detection dataset | DET dataset | https://appcenter-deeplearning.sh1a.qingstor.com/dataset/imagenet/ILSVRC2017_DET.tar.gz | 55GB |
ILSVRC2017 Object detection test dataset | DET test dataset | https://appcenter-deeplearning.sh1a.qingstor.com/dataset/imagenet/ILSVRC2017_DET_test_new.tar.gz | 428MB |
COCO
名称 | 地址 | 数量/尺寸 |
---|---|---|
2017 Train Images | https://appcenter-deeplearning.sh1a.qingstor.com/dataset/coco/train2017.zip | 118K/18GB |
2017 Val images | https://appcenter-deeplearning.sh1a.qingstor.com/dataset/coco/val2017.zip | 5K/1GB |
2017 Test images | https://appcenter-deeplearning.sh1a.qingstor.com/dataset/coco/test2017.zip | 41K/6GB |
2017 Unlabeled images | https://appcenter-deeplearning.sh1a.qingstor.com/dataset/coco/unlabeled2017.zip | 123K/19GB |
2017 Train/Val annotations | https://appcenter-deeplearning.sh1a.qingstor.com/dataset/coco/annotations_trainval2017.zip | 241MB |
2017 Stuff Train/Val annotations | https://appcenter-deeplearning.sh1a.qingstor.com/dataset/coco/stuff_annotations_trainval2017.zip | 401MB |
2017 Testing Image info | https://appcenter-deeplearning.sh1a.qingstor.com/dataset/coco/image_info_test2017.zip | 1MB |
2017 Unlabeled Image info | https://appcenter-deeplearning.sh1a.qingstor.com/dataset/coco/image_info_unlabeled2017.zip | 4MB |
PASCAL VOC
OpenSLR
Name | Category | Summary | Files |
---|---|---|---|
Vystadial | Speech | English and Czech data, mirrored from the Vystadial project | data_voip_cs.tgz [1.5G] data_voip_en.tgz [2.7G] |
TED-LIUM | Speech | English speech recognition training corpus from TED talks, created by Laboratoire d’Informatique de l’Université du Maine (LIUM) (mirrored here) | TEDLIUM_release1.tar.gz [21G] |
THCHS-30 | Speech | A Free Chinese Speech Corpus Released by CSLT@Tsinghua University | data_thchs30.tgz [6.4G] test-noise.tgz [1.9G] resource.tgz [24M] |
Aishell | Speech | Mandarin data, provided by Beijing Shell Shell Technology Co.,Ltd | data_aishell.tgz [15G] resource_aishell.tgz [1.2M] |
Free ST Chinese Mandarin Corpus | Speech | A free Chinese Mandarin corpus by Surfingtech (www.surfing.ai), containing utterances from 855 speakers, 102600 utterances; | ST-CMDS-20170001_1-OS.tar.gz [8.2G] |
VGGFace2
中英文维基百科语料
名称 | 描述 | 地址 | 尺寸 |
---|---|---|---|
zhwiki-latest-pages-articles.xml.bz2 | 2018年7月23日时最新的中文维基百科语料 | https://appcenter-deeplearning.sh1a.qingstor.com/dataset/wiki/zhwiki-latest-pages-articles.xml.bz2 | 1.5GB |
enwiki-latest-pages-articles.xml.bz2 | 2018年7月23日时最新的英文维基百科语料 | https://appcenter-deeplearning.sh1a.qingstor.com/dataset/wiki/enwiki-latest-pages-articles.xml.bz2 | 14.2GB |
预训练模型
TensorFlow-Slim image classification model library
下表中 Checkpoint 地址均为山河对象存储地址,可直接下载。