start working on object-detection

This commit is contained in:
Klesh Wong 2020-09-10 12:09:01 +08:00
parent 99a71e1c17
commit cad4b41579
3 changed files with 353 additions and 37 deletions

1
.gitignore vendored
View File

@ -2,3 +2,4 @@
/runs/
.vim/
.ipynb_checkpoints
/models/

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,52 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"本教程将会在预训练模型 [Mask R_CNN](https://arxiv.org/abs/1703.06870) 上针对 [Penn-Fudan Database for Pedestrian Detection and Segmentation](https://www.cis.upenn.edu/~jshi/ped_html/)数据进行调优。这个数据集有170张图片345个行人通过本教程可学习到如何使用 torchvision 的新特性来训练针对特定数据集的分割模型。\n",
"\n",
"\n",
"# 定义数据集\n",
"\n",
"按照训练物体检测,分割和人体关键点模型的参考脚本,可以很方便地支持添加新的自定义数据集。新的数据集必须继承 `torch.utils.data.Dataset` 类,同时实现 `__len__` 和 `__getitem__` 方法\n",
"\n",
"唯一需要注意的话,我们要坟 `__getitem__` 返回的格式如下:\n",
"\n",
"* image: 一个 `PILImage` 图像对象,其尺寸为 `(H,W)`\n",
"* target: 一个 `dict` 对象,含有以下的键:\n",
" * `boxes[FloatTensor[N, 4)`: 含有 `N` 个 bounding box 的数组其元素为4个格式为`[x0, y0, x1, y1]`\n",
" * `labels (Int64Tensor[N])`: 每个 bounding box 的标签。 `0` 表示背景\n",
" * `image_id (Int64Tensor[1])`: 图像id必须在整个数据集中唯一。\n",
" * `area (Tensor[N])`: bounding box 的面积。用以 Coco metric 评估分离大小不同的boxes\n",
" * `iscrowd (UintTensor[N])': 该值为True时将不会被用以评估\n",
" * 可选 `masks (UInt8Tensor[N, H, W])`: 每个物体的分离蒙板\n",
" * 可选 `keypoints (FloatTensor[N, K, 3])`: 对于 `N` 个物体,含有 `K` 个关键点。关键点的格式为`[x, y, visibility]`。`visibility=0` 表示关键点不可见。\n",
"\n",
"\n",
"data source: https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.4"
}
},
"nbformat": 4,
"nbformat_minor": 4
}