CSV数据集

  1. 数据集文件夹介绍
  2. 文件内容
    1. 1、photo
    2. 2、xml
    3. 3、classes.csv
    4. 4、train_annotations.csv
    5. 5、val_annotations.csv
    6. 6、CSVdata.py

数据集文件夹介绍

1
2
3
4
5
6
7
CSV
photo # 图片储存目录
xml # 图片标注的xml存放目录
classes.csv # 生成的标签文件
train_annotations.csv # 训练图片
val_annotations.csv # 验证图片
CSVdate.py # 数据集制作脚本

文件内容

1、photo

1
2
3
00001.jpg
00002.jpg
………….jpg

2、xml

1
2
3
00001.xml
00002.xml
………….xml

3、classes.csv

1
2
3
4
类别,ID
cat,0
dog,1
...,...

4、train_annotations.csv

1
2
3
4
5
CSV//photo//400000.jpg,449,25,907,595,cat
CSV//photo//400001.jpg,422,24,916,613,cat
CSV//photo//400002.jpg,464,42,893,607,dog
CSV//photo//400003.jpg,490,48,910,605,dog
...

5、val_annotations.csv

1
2
3
4
5
CSV//photo//400004.jpg,449,25,907,595,cat
CSV//photo//400005.jpg,422,24,916,613,cat
CSV//photo//400006.jpg,464,42,893,607,dog
CSV//photo//400007.jpg,490,48,910,605,dog
...

6、CSVdata.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
'''
* 说明:制作CSV格式
* 时间:2019-6-27
'''
import xml.dom.minidom
import os,shutil,glob, csv
'''--------------------------配置文件------------------------'''
# XML文件的目录
xmlpath = './/xml'
# 图片的目录
photopath = './/photo'

xmlpath = os.path.join(os.getcwd(), xmlpath)
photopath = os.path.join(os.getcwd(), photopath)
def getxmlinfo(filename):
dom = xml.dom.minidom.parse(filename)
root = dom.documentElement
namelist = root.getElementsByTagName('filename')
name = namelist[0]
label = list()
objectlist = root.getElementsByTagName('object')
for object in objectlist:
flaglist = object.getElementsByTagName('name')
flag = flaglist[0]
xminlist = object.getElementsByTagName('xmin')
xmin = xminlist[0]
yminlist = object.getElementsByTagName('ymin')
ymin = yminlist[0]
xmaxlist = object.getElementsByTagName('xmax')
xmax = xmaxlist[0]
ymaxlist = object.getElementsByTagName('ymax')
ymax = ymaxlist[0]
label += list([flag.firstChild.data, xmin.firstChild.data, ymin.firstChild.data, xmax.firstChild.data, ymax.firstChild.data])
return name.firstChild.data, label

class CSVDate():
def __init__(self, photopath = 'photo', xmlpath = 'xml'):
self.csv = []
self.photopath = photopath
self.xmlpath = xmlpath
self.test_size = 0.2
self.label = []

self.getcsv()
print(self.csv)
self.write_file()

def getcsv(self):
xmllist = os.listdir(self.xmlpath)
for xml in xmllist:
_, date = getxmlinfo(os.path.join(self.xmlpath, xml))
name = xml.split(".xml")[0]
for i in range(int(len(date)/5)):
if date[i*5 + 0][2] not in self.label:
self.label.append(date[i*5 + 0][2])
self.csv.append(['CSV//photo//' + name + '.jpg', date[i*5 + 1], date[i*5 + 2], date[i*5 + 3], date[i*5 + 4],date[i*5 + 0][2]])

def write_file(self):
with open('train_annotations.csv', 'w', newline='') as fp:
csv_writer = csv.writer(fp, dialect='excel')
csv_writer.writerows(self.csv[int(len(self.csv)*self.test_size):])
csv_writer.writerows(self.csv[:5])
with open('val_annotations.csv', 'w', newline='') as fp:
csv_writer = csv.writer(fp, dialect='excel')
csv_writer.writerows(self.csv[:int(len(self.csv)*self.test_size)])

class_name = sorted(self.label)
class_ = []
for num, name in enumerate(class_name):
class_.append([name, num])
with open('classes.csv', 'w', newline='') as fp:
csv_writer = csv.writer(fp, dialect='excel')
csv_writer.writerows(class_)


CSVDate(photopath, xmlpath)

微信:宏沉一笑
公众号:漫步之行

签名:Smile every day
名字:宏沉一笑
邮箱:whghcyx@outlook.com
个人网站:https://whg555.github.io



转载请注明来源,欢迎对文章中的引用来源进行考证,欢迎指出任何有错误或不够清晰的表达。可以在下面评论区评论,也可以邮件至 whghcyx@outlook.com

文章标题:CSV数据集

文章字数:684

本文作者:宏沉一笑

发布时间:2020-02-04, 14:28:49

最后更新:2023-06-19, 13:58:36

原始链接:https://whghcyx.gitee.io/2020/02/04/AI-2020-2-4-CSV%E6%95%B0%E6%8D%AE%E9%9B%86/

版权声明: "署名-非商用-相同方式共享 4.0" 转载请保留原文链接及作者。

目录
×

喜欢就点赞,疼爱就打赏