You Only Look Once

1. YOLO
2. YOLO實作(現成模型)
3. 現成模型(擴充應用)
4. YOLO + Object Detection
5. 課堂任務

1. YOLO

YOLO(You Only Look Once)是一種即時物件檢測系統,它的目標是在要一張圖片中檢測出所有的物件,並且給出每個物件的邊界框和類別標籤. YOLO的特點是速度快, 準確率高, 適合實時(real-time)應用。

1.1. YOLO的歷史

YOLO最早是由Joseph Redmon等人於2015年提出的, 其後經過多次改進, 包括YOLOv2、YOLOv3、YOLOv4、YOLOv5、YOLOv6、YOLOv7, YOLOv8…. YOLO系列模型在物件檢測領域取得了很大的成功, 並且被廣泛應用於各種實際場景中.

YOLO系列模型的發展歷程如下¹:

YOLOv1 (2015) Joseph Redmon: You Only Look Once: Unified, Real-Time Object Detection
YOLOv2 (2017) Joseph Redmon: YOLO9000: Better, Faster, Stronger
YOLOv3 (2018) Joseph Redmon YOLOv3: An Incremental Improvement
YOLOv4 (2020) Alexey Bochkovskiy, Chien-Yao Wang(中研院王建堯), Hong-Yuan Mark Liao: YOLOv4: Optimal Speed and Accuracy of Object Detection
2020 年 Joseph Redmon 突然投下了一枚震撼彈，他受夠 YOLO 不斷被運用在軍事應用以及個人隱私，宣布停止電腦視覺相關的研究。
YOLOv5 (2020) Glen Jocher
PP-YOLO (2020) Xiang Long et al.: PP-YOLO: An Effective and Efficient Implementation of Object Detector
YOLOZ (2021) Aduen Benjumea et al.: YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicles
YOLO-ReT (2021) Prakhar Ganesh et al.: YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs
Scaled-YOLOv4 (2021) Chien-Yao Wang et al. Scaled-YOLOv4: Scaling Cross Stage Partial Network’
YOLOX (2021) Zheng Ge et al. YOLOX: Exceeding YOLO Series in 2021
YOLOR (2021) Chien-Yao Wang et al. You Only Learn One Representation: Unified Network for Multiple Tasks
YOLOS (2021) Yuxin Fang et al. You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection
YOLOF (2021) Qiang Chen et al. You Only Look One-level Feature
YOLOP (2022) Dong Wu et al. YOLOP: You Only Look Once for Panoptic Driving Perception
YOLOv6 (2022) 美团技术团队
YOLOv7 (2022) Chien-Yao Wang(中研院王建堯), Alexey Bochkovskiy, Hong-Yuan Mark Liao(中研院資訊所所長廖弘源): Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
YOLOv8(2023): Ultralytics
YOLOv9(2024) Chien-Yao Wang(中研院王建堯), I-Hau Yeh(國立臺北科技大學電子工程系葉儀晧), Hong-Yuan Mark Liao(中研院資訊所所長廖弘源): YOLOv9: A New Era of Object Detection
世界最快的AI視覺辨識，來自台灣！中研院資訊所所長廖弘源如何催生YOLO？
YOLOv10(2024) Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, Guiguang Ding: YOLOv10: Real-Time End-to-End Object Detection
YOLOv11(2024) Rahima Khanam, Muhammad Hussain: YOLOv11: An Overview of the Key Architectural Enhancements
YOLOv12(2025) Yunjie Tian, Qixiang Ye, David Doermann: YOLOv12: Attention-Centric Real-Time Object Detectors

1.2. YOLO工作原理

YOLO的特點是將物件檢測視為一個回歸問題, 直接從圖像像素到邊界框和類別概率的映射, 這樣可以實現即時檢測. YOLO的網路結構是基於全卷積網路(FCN), 將整張圖像分成SxS的格子, 每個格子預測B個邊界框和C個類別概率. YOLO的優點是速度快, 準確率高, 適合實時應用.

1.3. YOLO8

目前YOLO的最新版本為YOLO12, YOLOv12是一個最新的物件偵測模型，具有創新的注意力機制架構，顯著提升了速度和準確性。為了配合403教室沒有GPU且執行效能令人哀傷的Mac Mini，這裡使用的是YOLOv8的輕量級版本，這個版本的模型大小和運算量都比YOLOv7小得多，適合在資源有限的環境中使用。

Yolov8 跟 Yolov5 都是由 Ultralytics 開發，一樣是使用 PyTorch 去做訓練，提供三大類型的訓練方式² :

物件偵測（Object Detection）
實例分割（Instance Segmentation）
影像分類（Image Classification）

Ultralytics 官方的預訓練模型（如 yolov8n.pt, yolov8s.pt, yolov8m.pt 等）使用 COCO dataset 訓練的，預設可以識別的物件類別：80 種，包括人、動物、交通工具、家具、電器等:

person, bicycle, car, motorcycle, airplane, bus, train, truck
bird, cat, dog, horse, sheep, cow, elephant, bear, zebra, giraffe
backpack, umbrella, handbag, tie, suitcase
bottle, wine glass, cup, fork, knife, spoon, bowl
TV, laptop, mouse, remote, keyboard, cellphone

完整清單可參見：

1: from ultralytics import YOLO
2: model = YOLO('yolov8n.pt') #從 Ultralytics 自己托管的伺服器下載模型
3: print(model.names)

{0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone', 68: 'microwave', 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator', 73: 'book', 74: 'clock', 75: 'vase', 76: 'scissors', 77: 'teddy bear', 78: 'hair drier', 79: 'toothbrush'}

1.4. YOLOv8 模型種類：

名稱	大小	準確率	推理速度	適用情境
yolov8n	小	低	非常快	手機、嵌入式設備
yolov8s	小	中	快	通用快速識別任務
yolov8m	中	高	中	準確率與速度平衡
yolov8l	大	更高	慢	準確率要求較高的任務
yolov8x	最大	最高	最慢	高準確率、伺服器端推論

2. YOLO實作(現成模型)

2.1. 偵測靜態照片

2.1.1. 安裝所需套件

ultralytics套件是YOLOv8的官方實作，這裡使用的是最新版本的ultralytics套件，這個版本的模型大小和運算量都比YOLOv7小得多，適合在資源有限的環境中使用。

安裝方式如下：

1: pip install ultralytics

2.1.2. 本機執行

1: from ultralytics import YOLO
2: 
3: # 載入 YOLOv8 nano 模型
4: model = YOLO("yolov8n.pt")
5: 
6: # 對圖片進行推論
7: results = model("/Users/letranger/Downloads/cardog.jpg")  # 換成你的圖片檔名
8: results[0].show()
9:

2.1.3. colab執行

如果在colab上則需要使用以下指令

1: !pip install ultralytics

1: results = model("/content/road.png")

辨識網路上的圖片

1: results = model("https://ultralytics.com/images/bus.jpg", show=True)
2: results[0].show()

2.2. 即時偵測(本機)

OpenCV 是一個跨平台的電腦視覺套件，全名為 Open Source Computer Vision Library。此處我們以OpenCV 來啟用 mac mini 的 webcam，並使用 YOLOv8 進行即時偵測。

以下以403 mac mini上的PyCharm來執行:

1: pip install numpy==1.26.4
2: pip install ultralytics

 1: import cv2
 2: from ultralytics import YOLO
 3: 
 4: # 載入預訓練模型
 5: model = YOLO("yolov8n.pt")
 6: 
 7: # 啟用 MacBook 的內建 webcam
 8: cap = cv2.VideoCapture(0)  # 0 是預設攝影機裝置
 9: 
10: # 檢查攝影機是否成功打開
11: if not cap.isOpened():
12:     print("❌ 無法打開攝影機")
13:     exit()
14: 
15: while True:
16:     ret, frame = cap.read()
17:     if not ret:
18:         print("❌ 無法讀取畫面")
19:         break
20: 
21:     # 進行 YOLO 偵測
22:     results = model(frame, verbose=False)
23:     annotated_frame = results[0].plot()
24: 
25:     # 顯示偵測結果
26:     cv2.imshow("YOLOv8 Live Detection", annotated_frame)
27: 
28:     # 按下 q 鍵離開
29:     if cv2.waitKey(1) & 0xFF == ord('q'):
30:         break
31: 
32: # 釋放資源
33: cap.release()
34: cv2.destroyAllWindows()
35:

2.3. OpenCV版的人臉偵測

OpenCV 也可以做人臉辨識（Face Detection / Recognition），而且它還提供兩種層次的功能：

2.3.1. 人臉「偵測」（Face Detection）

這是找出影像中哪裡有臉（畫框框），常用於攝影機即時預覽或手機臉部對焦功能。常用方法：

Haar Cascade Classifier（經典、快速，但準確率較低）：

1: face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
2: faces = face_cascade.detectMultiScale(gray_frame, scaleFactor=1.1, minNeighbors=5)

DNN 模型（比較準確）：OpenCV 提供 deploy.prototxt + res10_300x300_ssd_iter_140000.caffemodel 可供載入 DNN 模型做人臉偵測。

2.3.2. 2. 人臉「辨識」（Face Recognition）

進一步判斷「這張臉是誰」，需要先進行訓練或比對特徵。

face_recognition Python 套件（底層仍用 OpenCV 與 dlib）：
1. 先建立人臉特徵（encoding）
2. 再進行比對

1: import face_recognition
2: face_locations = face_recognition.face_locations(image)
3: face_encodings = face_recognition.face_encodings(image, face_locations)

2.4. 功能比較（YOLO vs OpenCV）：

功能類型	YOLO	OpenCV
物件偵測	✅ 強大且支援多類別	⚠️ 內建較少（需自己訓練）
人臉偵測	✅（需訓練/套件支援）	✅（Haar/DNN）
人臉辨識	❌（原生不支援）	⚠️（需搭配 dlib/face_recognition）
執行效率	高（GPU加速）	中（較輕量，適合 CPU）
易用性	容易（Ultralytics 很友好）	較基礎（需手動設定）

 1: import cv2
 2: 
 3: # 載入 Haar 人臉偵測模型（OpenCV 內建）
 4: face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
 5: 
 6: # 開啟攝影機（0 = 預設 webcam）
 7: cap = cv2.VideoCapture(0)
 8: 
 9: # 檢查攝影機是否開啟成功
10: if not cap.isOpened():
11:     print("❌ 無法開啟攝影機")
12:     exit()
13: 
14: while True:
15:     # 讀取一張影像
16:     ret, frame = cap.read()
17:     if not ret:
18:         print("❌ 無法讀取畫面")
19:         break
20: 
21:     # 轉成灰階影像（人臉偵測通常在灰階上進行）
22:     gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
23: 
24:     # 偵測人臉
25:     faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)
26: 
27:     # 畫出每一張人臉的位置
28:     for (x, y, w, h) in faces:
29:         cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
30: 
31:     # 顯示畫面
32:     cv2.imshow("Face Detection (press q to quit)", frame)
33: 
34:     # 按下 q 鍵結束
35:     if cv2.waitKey(1) & 0xFF == ord('q'):
36:         break
37: 
38: # 釋放資源
39: cap.release()
40: cv2.destroyAllWindows()
41:

3. 現成模型(擴充應用)

3.1. Face Recognition

3.1.1. 目標

解法：結合 YOLOv8 的人臉偵測與 face recognition（臉部辨識模型）
技術：使用 face_recognition 套件 + Dlib 模型進行比對

3.1.2. 程式邏輯

使用 YOLO 偵測「人」或「人臉」的位置
擷取這些區塊，傳給 face recognition 模型比對
顯示對應姓名在畫面上

3.1.3. 實際應用場景：

學生點名系統
門禁刷臉驗證
考場監控識別

3.2. 細部屬性辨識（例如：是否戴口罩）

目標：偵測到人之後，再判斷是否戴口罩、穿制服、舉手等行為
解法：
- 使用 YOLO 自行訓練分類為：
  - person_masked
  - person_unmasked
- 或搭配另一個 mask classifier 模型判斷臉部特徵

3.2.1. 可採用策略：

策略	說明
多類別 YOLO 模型訓練	將戴口罩與沒戴口罩當作不同類別一起訓練
二階段分析（YOLO + classifier）	YOLO 偵測人臉 → 用 CNN 模型進一步分類是否戴口罩

3.2.2. 實際應用場景：

校園口罩政策監測
進入實驗室前的裝備檢查
安全帽佩戴偵測

3.2.3. 進一步應用整合方式

擴充目標	需要模型	備註
✅ 辨識是誰	face_recognition + YOLO	需建立學生臉部特徵資料庫
😷 有沒有戴口罩	YOLO 自訓練 or CNN 分類器	可與人臉辨識並用
🧥 穿什麼衣服顏色	圖像特徵分析（color mask）	OpenCV HSV 過濾
🖐️ 舉手、有動作行為	Skeleton pose (YOLO-pose)	YOLOv8 支援 pose 模型

3.3. Face Recognition

這裡用的是face recognition的套件，這個套件是基於dlib的臉部辨識模型，使用了HOG和CNN兩種方法來進行臉部特徵提取和比對。這個套件的優點是簡單易用，並且可以在CPU上運行。但是該套件預設是辨識完整臉部特徵（包含鼻子與嘴巴）戴口罩會遮住 50% 以上的臉部資訊，導致：

face_encodings() 無法產生有效特徵向量
即使產生，也可能無法與原始（無口罩）特徵比對成功

所以，只適合用於辨識「不戴口罩」的臉部特徵

1: pip install face_recognition opencv-python

3.3.1. 資料準備

各組組員準備5-10張不同角度、光線的自拍照片，命名規則為「姓名_n.jpg」，例如：

James_1.jpg
James_2.jpg
…
James_10.jpg
…
Vanessa_1.jpg
Vanessa_2.jpg
…
Vanessa_9.jpg

檔名請先不要使用中文，並將照片放在同一個資料夾中。這裡假設資料夾路徑為「/Users/letranger/Downloads/images」。

如果你使用403教室的Mac Mini，請將照片放在「/Users/student/Desktop/images」資料夾中。如果你是用自己的電腦，請自行修改路徑。

3.3.2. save_encodings.py

這個程式會將所有照片的特徵向量與姓名儲存到一個pickle檔案(encodings.pkl)中，這樣在辨識的時候就可以直接載入這個檔案，而不需要每次都重新計算特徵向量。

 1: import face_recognition
 2: import os, pickle
 3: from PIL import UnidentifiedImageError
 4: 
 5: known_encodings = []
 6: known_names = []
 7: 
 8: for file in os.listdir("/Users/letranger/Downloads/images"):
 9:     if file.startswith(".") or not file.lower().endswith((".jpg", ".jpeg", ".png")):
10:         continue
11:     name = file.split("_")[0]
12:     try:
13:         img = face_recognition.load_image_file(f"/Users/letranger/Downloads/images/{file}")
14:         encodings = face_recognition.face_encodings(img)
15:         if encodings:
16:             known_encodings.append(encodings[0])
17:             known_names.append(name)
18:     except UnidentifiedImageError:
19:         print(f"❌ 無法辨識的圖片檔案：{file}")
20: 
21: # 儲存特徵與名稱，這個路徑要記得，底下會用到
22: with open("/Users/letranger/Downloads/images/encodings.pkl", "wb") as f:
23:     pickle.dump((known_encodings, known_names), f)
24:

3.3.3. realtime_recognition.py

上面的程式已經將所有照片的特徵向量與姓名儲存到一個pickle檔案(encodings.pkl)中，這樣在辨識的時候就可以直接載入這個檔案，而不需要每次都重新計算特徵向量。這裡的程式碼是使用 YOLOv8 的預訓練模型(encodings.pkl)來進行人臉偵測，然後使用 face_recognition 套件來進行人臉辨識。

 1: import pickle
 2: import cv2, face_recognition
 3: from ultralytics import YOLO
 4: 
 5: # 載入上一支程式的 encodings.pkl
 6: with open("/Users/letranger/Downloads/images/encodings.pkl", "rb") as f:
 7:     known_encodings, known_names = pickle.load(f)
 8: 
 9: model = YOLO("yolov8n.pt")
10: cap = cv2.VideoCapture(0)
11: 
12: while True:
13:     ret, frame = cap.read()
14:     results = model(frame)
15: 
16:     for box in results[0].boxes.xyxy:
17:         x1, y1, x2, y2 = map(int, box)
18:         face_img = frame[y1:y2, x1:x2]
19:         rgb_face = cv2.cvtColor(face_img, cv2.COLOR_BGR2RGB)
20: 
21:         encodings = face_recognition.face_encodings(rgb_face)
22:         if encodings:
23:             match = face_recognition.compare_faces(known_encodings, encodings[0])
24:             name = known_names[match.index(True)] if True in match else "Unknown"
25:             # 在人臉上方顯示姓名(請自行變更這裡的程式)
26:             cv2.putText(frame, name, (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0,255,0), 2)
27: 
28:     cv2.imshow("Face Recognition + YOLO", frame)
29:     if cv2.waitKey(1) & 0xFF == ord('q'):
30:         break
31: 
32: cap.release()
33: cv2.destroyAllWindows()

3.4. 中文問題

上面的程式碼使用了 OpenCV 的預設字型，這個字型不支援中文，所以如果要顯示中文的話需要使用其他的字型。這裡提供另一個版本的範例程式。如此一來就可以在畫面上顯示中文的姓名了。例如

王小明_1.jpg
王小明_2.jpg
…
王小明_10.jpg
…
陳大明_1.jpg
陳大明_2.jpg
…
陳大明_9.jpg

 1: from PIL import ImageFont, ImageDraw, Image
 2: import cv2, face_recognition, platform, pickle
 3: from ultralytics import YOLO
 4: import numpy as np
 5: 
 6: # 根據作業系統自動選擇中文字型
 7: system = platform.system()
 8: if system == "Windows":
 9:     font_path = "C:/Windows/Fonts/msjh.ttc"  # 微軟正黑體
10: elif system == "Darwin":
11:     font_path = "/System/Library/Fonts/STHeiti Medium.ttc"  # macOS 華文黑體
12: else:
13:     font_path = "/usr/share/fonts/truetype/noto/NotoSansCJK-Regular.ttc"  # Linux (這個需自己安裝思源黑體)
14: font = ImageFont.truetype(font_path, 32)
15: 
16: # 啟用 MacBook 的內建 webcam
17: cap = cv2.VideoCapture(0)  # 0 是預設攝影機裝置
18: 
19: # 底下的pkl要改成你自己的路徑
20: with open("/Users/letranger/Downloads/images/encodings.pkl", "rb") as f:
21:     known_encodings, known_names = pickle.load(f)
22: 
23: model = YOLO("yolov8n.pt")
24: 
25: 
26: 
27: while True:
28:     ret, frame = cap.read()
29:     results = model(frame)
30: 
31:     for box in results[0].boxes.xyxy:
32:         x1, y1, x2, y2 = map(int, box)
33:         face_img = frame[y1:y2, x1:x2]
34:         rgb_face = cv2.cvtColor(face_img, cv2.COLOR_BGR2RGB)
35: 
36:         encodings = face_recognition.face_encodings(rgb_face)
37:         if encodings:
38:             match = face_recognition.compare_faces(known_encodings, encodings[0])
39:             name = known_names[match.index(True)] if True in match else "未知"
40: 
41:             # 將 OpenCV 圖像轉成 PIL 格式來畫中文字
42:             frame_pil = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
43:             draw = ImageDraw.Draw(frame_pil)
44:             draw.text((x1, y1-40), name, font=font, fill=(0, 255, 0))  # 顯示中文
45: 
46:             # 轉回 OpenCV 圖像
47:             frame = cv2.cvtColor(np.array(frame_pil), cv2.COLOR_RGB2BGR)
48: 
49:     cv2.imshow("Face Recognition + YOLO", frame)
50:     if cv2.waitKey(1) & 0xFF == ord('q'):
51:         break

3.5. YOLO + Face Recognition

上面的程式只能用來辨識你所拍到的人臉，因為我們用的模型是利用face_recognition訓練出來的(save_encodings.py)，如果我們希望在辨識自訂人像的同時也能辨識出其他物件，像是貓、狗、車子等，這時候就需要使用 YOLOv8 的預訓練模型來進行物件偵測了。

程式設計的邏輯如下：

先使用YOLO進行webcam即時偵測，讓YOLO來辨識畫面上有什麼物件。
如果畫面中出現人臉(0: person)，再利用剛剛寫的realtime_recognition.py來進行人臉辨識。

 1: import cv2,
 2: import pickle
 3: import platform
 4: import face_recognition
 5: from ultralytics import YOLO
 6: from PIL import Image, ImageDraw, ImageFont
 7: import numpy as np
 8: 
 9: # 載入 YOLOv8 模型
10: model = YOLO("yolov8n.pt")
11: 
12: # 載入臉部特徵資料
13: with open("/Users/letranger/Dropbox/YOLO8/faces/encodings.pkl", "rb") as f:
14:     known_encodings, known_names = pickle.load(f)
15: 
16: # 載入中文字型
17: system = platform.system()
18: if system == "Windows":
19:     font_path = "C:/Windows/Fonts/msjh.ttc"  # 微軟正黑體
20: elif system == "Darwin":
21:     font_path = "/System/Library/Fonts/STHeiti Medium.ttc"  # macOS 華文黑體
22: else:
23:     font_path = "/usr/share/fonts/truetype/noto/NotoSansCJK-Regular.ttc"  # Linux 思源黑體（需自行安裝）
24: 
25: font = ImageFont.truetype(font_path, 28)
26: 
27: # 開啟攝影機
28: cap = cv2.VideoCapture(0)
29: 
30: while True:
31:     ret, frame = cap.read()
32:     if not ret:
33:         break
34: 
35:     results = model(frame)
36:     boxes = results[0].boxes
37:     cls_names = results[0].names
38: 
39:     # 用 PIL 畫圖，支援中文
40:     image_pil = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
41:     draw = ImageDraw.Draw(image_pil)
42: 
43:     for i, box in enumerate(boxes.xyxy):
44:         x1, y1, x2, y2 = map(int, box.tolist())
45:         cls_id = int(boxes.cls[i])
46: 
47:         # 如果是人，做臉部辨識
48:         if cls_id == 0:
49:             face_img = frame[y1:y2, x1:x2]
50:             rgb_face = cv2.cvtColor(face_img, cv2.COLOR_BGR2RGB)
51:             # 使用 face_recognition 進行臉部辨識
52:             encodings = face_recognition.face_encodings(rgb_face)
53: 
54:             name = "未知"
55:             if encodings:
56:                 matches = face_recognition.compare_faces(known_encodings, encodings[0])
57:                 if True in matches:
58:                     name = known_names[matches.index(True)]
59: 
60:             draw.rectangle([(x1, y1), (x2, y2)], outline="green", width=3)
61:             draw.text((x1, y1 - 30), name, font=font, fill=(0, 255, 0))
62: 
63:         else:
64:             label = cls_names[cls_id]
65:             draw.rectangle([(x1, y1), (x2, y2)], outline="blue", width=3)
66:             draw.text((x1, y1 - 30), label, font=font, fill=(0, 0, 255))
67: 
68:     # 顯示結果
69:     frame = cv2.cvtColor(np.array(image_pil), cv2.COLOR_RGB2BGR)
70:     cv2.imshow("YOLO + Face Recognition 中文版", frame)
71:     if cv2.waitKey(1) & 0xFF == ord('q'):
72:         break
73: 
74: cap.release()
75: cv2.destroyAllWindows()
76:

3.5.1. 延伸應用

體驗完上面的程式後，你可以試著將這個程式碼擴展到其他的應用場景，例如：

辨識你家的貓或狗，並且在畫面上顯示牠們的名字。
辨識門口的訪客，並且在畫面上顯示他們的名字。
自動拍照，當偵測到你或你的朋友時，拍下照片並儲存到指定的資料夾中。
結合ESP32等嵌入式設備，用來控制自動門禁系統，當偵測到你或你的朋友時，自動開門。

4. YOLO + Object Detection

YOLO目前可以辨識的物件類別有80種，包含人、貓、狗、汽車、飛機等，如果你想為YOLO新增一些其他物品的辨識功能(如學生證、原子筆)，那我們就要準備好大量的訓練資料(照片及標註)，然後使用 YOLOv8 來進行訓練。照片部份可以使用手機拍攝，然後上傳到電腦中。標註的部份可以使用makesense.ai網站來進行標註，這個網站提供了簡單易用的標註工具，可以讓你快速地為圖片加上標籤與邊界框。

4.1. 設計流程

整個流程大致如下：

建立資料夾：在你的Python Project資料夾下建一個images資料夾，裡面再建一個train資料夾與val資料夾，分別放訓練與驗證的圖片。
準備資料集：為每位組員和要拍攝的物件準備至少 15 張圖片(各種角度及光線，越多越好)，大概10張放在images/train中，5張放images/val中，照片數量越多越好。
一種偷懶的做法是每種圖片拍10張，train和val放一樣的內容….壞處是會導致模型過擬合(overfitting)，所以建議還是要分開。
為照片資料加上標籤：使用makesense.ai網站上傳圖片，加入label並且標註每一個物件的邊界框，然後下載標註好的資料。
生成data.yaml檔案：這個檔案是用來告訴 YOLOv8 你的資料集的結構，包括訓練資料和驗證資料的路徑，以及物件的類別名稱。
訓練模型：使用 YOLOv8 的預訓練模型來進行物件偵測，並且將檢測到的物件畫在圖片上。
測試模型：使用 YOLOv8 的預訓練模型來進行物件偵測，並且將檢測到的物件畫在圖片上。

4.1.1. 系統環境

假設你的Python專案YOLO放在桌面(/Users/letranger/Desktop/YOLO)，大概的檔案結構如下：

1: /Users/letranger/Desktop/YOLO
2: ├── main.py
3: ├── images
4: │   ├── train
5: │   └── val
6: └── labels
7:     ├── train
8:     └── val

4.1.2. 準備圖片資料

準備要拍攝的圖片，每種物件至少要有 10 張圖片，並且將這些圖片分別放到訓練資料夾(train)與驗證資料夾(val)中，每一種物體或人臉的照片請以「物件名稱_編號」的方式命名，例如(這裡假設資料夾路徑為「/Users/student/Deskltop/YOLO/images/train」):
- dog_1.jpg
- dog_2.jpg
- …
- dog_10.jpg
- cat_1.jpg
- cat_2.jpg
- …
- cat_10.jpg
另外再準備一組驗證集資料夾，裡面放另外5組圖片，命名規則與訓練集相同

 1: /Users/student/Desktop/YOLO
 2: ├── main.py
 3: ├── images
 4: │   ├── train
 5: │   │   ├── CoffeeCup_1.jpg
 6: │   │   ├── CoffeeCup_2.jpg
 7: ....
 8: │   │   ├── CryBaby_1.jpg
 9: │   │   ├── CryBaby_2.jpg
10: ...
11: │   └── val
12: │       ├── CoffeeCup_11.jpg
13: │       ├── CoffeeCup_12.jpg
14: ...
15: │       ├── CryBaby_11.jpg
16: │       ├── CryBaby_12.jpg
17: ...

4.1.3. 設定標籤

至https://www.makesense.ai/網站上傳圖片，加入label並且標註每一個物件的邊界框

Figure 1: Caption
下載標註好的資料。
- 選擇 Export Annotations
  
  Figure 2: Caption
- 以YOLO格式下載，並解壓縮檔案
  
  Figure 3: Caption
分別把images中train與val的圖片上傳、標註資料、再下載zip檔、解壓縮zip，然後把相對應的txt移至labels中的train與val資料夾中

Figure 4: Caption

 1: 
 2: /Users/letranger/Desktop/YOLO
 3: ├── main.py
 4: ├── images
 5: │   ├── train
 6: │   │   ├── 皮尺_1.jpg
 7: │   │   ├── 皮尺_2.jpg
 8: ....
 9: │   │   ├── 學生證_1.jpg
10: │   │   ├── 學生證_2.jpg
11: ...
12: │   └── val
13: │       ├── 皮尺_11.jpg
14: │       ├── 皮尺_12.jpg
15: ...
16: │       ├── 學生證_11.jpg
17: │       ├── 學生證_12.jpg
18: ...
19: ├── labels
20: │   ├── train (這裡是訓練集圖片的annotation)
21: │   │   ├── 皮尺_1.txt
22: │   │   ├── 皮尺_2.txt
23: ...
24: │   │   ├── 學生證_1.txt
25: │   │   ├── 學生證_2.txt
26: ...
27: │   ├── val(這裡是驗證集圖片的annotation)
28: │   │   ├── 皮尺_11.txt
29: │   │   ├── 皮尺_12.txt
30: ...
31: │   │   ├── 學生證_11.txt
32: │   │   ├── 學生證_12.txt
33: ...
34: └── yolov8n.pt

我的資枓夾結構如下：

train資料夾裡的檔案內容如下:

4.1.4. 編輯data.yaml

nc: 物件種類
names: 所有的物件標籤，就是你在MakeSense中設定的label

1: train: /Users/student/Desktop/YOLO/images/train
2: val: /Users/student/Desktop/YOLO/images/val
3: nc: 2
4: names: ['皮尺', '學生證']

4.2. 訓練模型

1: yolo detect train model=yolov8n.pt data=data.yaml epochs=50 imgsz=640

訓練結束後會出現如下資訊:

5 epochs completed in 0.015 hours.
Optimizer stripped from /Users/letranger/Dropbox/notes/roam/runs/detect/train4/weights/last.pt, 6.2MB
Optimizer stripped from /Users/letranger/Dropbox/notes/roam/runs/detect/train4/weights/best.pt, 6.2MB

Validating /Users/letranger/Dropbox/notes/roam/runs/detect/train4/weights/best.pt...
Ultralytics 8.3.111 🚀 Python-3.12.9 torch-2.6.0 CPU (Apple M3)
Model summary (fused): 72 layers, 3,006,623 parameters, 0 gradients, 8.1 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:03<00:00,  3.63s/it]
                   all         25         25    0.00705          1      0.274      0.213
             CoffeeCup          5          5    0.00224          1     0.0721     0.0553
               CryBaby          5          5    0.00323          1      0.366      0.286
       DigitalFortress          5          5    0.00198          1      0.266       0.22
              KeyChain          5          5    0.00509          1       0.18     0.0748
                SigBox          5          5     0.0227          1      0.485      0.427
Speed: 1.0ms preprocess, 112.6ms inference, 0.0ms loss, 1.5ms postprocess per image
Results saved to /Users/letranger/Desktop/yolo/runs/detect/train4
💡 Learn more at https://docs.ultralytics.com/modes/train

留意倒數第二列的資訊，這裡告訴你訓練後的模型存在什麼資料夾，內容如下：

❯ ls -l /Users/letranger/Dropbox/notes/roam/runs/detect/train4
total 9064
-rw-r--r--@ 1 letranger  staff    1525  4 20 16:58 args.yaml
-rw-r--r--@ 1 letranger  staff  125866  4 20 17:00 confusion_matrix_normalized.png
-rw-r--r--@ 1 letranger  staff  116648  4 20 17:00 confusion_matrix.png
-rw-r--r--@ 1 letranger  staff   88074  4 20 17:00 F1_curve.png
-rw-r--r--@ 1 letranger  staff  199910  4 20 16:59 labels_correlogram.jpg
-rw-r--r--@ 1 letranger  staff  131976  4 20 16:59 labels.jpg
-rw-r--r--@ 1 letranger  staff  102262  4 20 17:00 P_curve.png
-rw-r--r--@ 1 letranger  staff  118821  4 20 17:00 PR_curve.png
-rw-r--r--@ 1 letranger  staff  100473  4 20 17:00 R_curve.png
-rw-r--r--@ 1 letranger  staff     780  4 20 16:59 results.csv
-rw-r--r--@ 1 letranger  staff  281806  4 20 17:00 results.png
-rw-r--r--@ 1 letranger  staff  664189  4 20 16:59 train_batch0.jpg
-rw-r--r--@ 1 letranger  staff  659940  4 20 16:59 train_batch1.jpg
-rw-r--r--@ 1 letranger  staff  670238  4 20 16:59 train_batch2.jpg
-rw-r--r--@ 1 letranger  staff  694635  4 20 17:00 val_batch0_labels.jpg
-rw-r--r--@ 1 letranger  staff  650796  4 20 17:00 val_batch0_pred.jpg
drwxr-xr-x@ 4 letranger  staff     128  4 20 16:59 weights

其中weights資料夾中就儲存了訓練後的模型

4.3. 測試模型

 1: import cv2
 2: import platform
 3: import numpy as np
 4: from PIL import Image, ImageDraw, ImageFont
 5: from ultralytics import YOLO
 6: 
 7: # 載入訓練好的模型，這裡的路徑要改成你自己的，實際的資料夾位址請參考上面訓練模型的結果
 8: model = YOLO("/Users/letranger/Desktop/YOLO/runs/detect/train11/weights/best.pt")
 9: 
10: # 根據作業系統選擇中文字型
11: system = platform.system()
12: if system == "Windows":
13:     font_path = "C:/Windows/Fonts/msjh.ttc"
14: elif system == "Darwin":
15:     font_path = "/System/Library/Fonts/STHeiti Medium.ttc"
16: else:
17:     font_path = "/usr/share/fonts/truetype/noto/NotoSansCJK-Regular.ttc"  # Linux 思源黑體（需先安裝）
18: 
19: font = ImageFont.truetype(font_path, 28)
20: 
21: # 啟用攝影機
22: cap = cv2.VideoCapture(0)
23: 
24: # 可選：設定解析度提升偵測效果
25: cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
26: cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
27: 
28: while True:
29:     ret, frame = cap.read()
30:     if not ret:
31:         break
32: 
33:     # YOLO 推論（可加入 conf=0.1 降低門檻）
34:     results = model(frame, conf=0.1)
35: 
36:     # 使用 PIL 處理中文字繪圖
37:     image_pil = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
38:     draw = ImageDraw.Draw(image_pil)
39: 
40:     for box in results[0].boxes:
41:         cls_id = int(box.cls[0])
42:         conf = float(box.conf[0])
43:         label = f"{model.names[cls_id]} ({conf:.2f})"
44: 
45:         x1, y1, x2, y2 = map(int, box.xyxy[0])
46:         draw.rectangle([(x1, y1), (x2, y2)], outline=(255, 0, 0), width=3)
47:         draw.text((x1, y1 - 30), label, font=font, fill=(255, 0, 0))
48: 
49:     # 回轉為 OpenCV 顯示格式
50:     annotated = cv2.cvtColor(np.array(image_pil), cv2.COLOR_RGB2BGR)
51:     cv2.imshow("YOLOv8 Real-Time Detection 中文版", annotated)
52: 
53:     key = cv2.waitKey(1) & 0xFF
54:     if key == ord('q'):
55:         break
56:     elif key == ord('s'):
57:         cv2.imwrite("snapshot.jpg", frame)
58:         print("📸 已儲存畫面 snapshot.jpg")
59: 
60: cap.release()
61: cv2.destroyAllWindows()

4.3.1. 執行結果

2025-05-01_13-51-13 (1).gif

Figure 5: YOLO物件偵測

4.4. 結合原本的YOLO辨識功能

和人臉辨識一樣，上面那支程式碼只能用來辨識我們準備好的物件，如果我們希望在辨識自訂物件的同時也能辨識出其他物件，像是YOLO原本就能辨識的貓、狗、車子等，這時候就需要使用 YOLOv8 的預訓練模型來進行物件偵測了。

在上面YOLO+face_recognition的程式碼中，我們是先用YOLOv8來偵測，發現是person後再利用自己的模型來進一步做人臉辨識，這樣就可以同時辨識人臉和其他物件了。

其實，我們也可以同時使用兩個模型(YOLO, 物件偵測模型)將結果同時放在一個畫面中。

 1: import cv2
 2: import platform
 3: import numpy as np
 4: from PIL import Image, ImageDraw, ImageFont
 5: from ultralytics import YOLO
 6: 
 7: # 載入 COCO 模型與自訓模型
 8: model_coco = YOLO("yolov8n.pt")  # 原生 80 類
 9: model_custom = YOLO("/Users/letranger/Desktop/YOLO/runs/detect/train11/weights/best.pt")  # 你訓練的模型
10: 
11: # 字型載入
12: system = platform.system()
13: if system == "Windows":
14:     font_path = "C:/Windows/Fonts/msjh.ttc"
15: elif system == "Darwin":
16:     font_path = "/System/Library/Fonts/STHeiti Medium.ttc"
17: else:
18:     font_path = "/usr/share/fonts/truetype/noto/NotoSansCJK-Regular.ttc"
19: 
20: font = ImageFont.truetype(font_path, 28)
21: 
22: # 開啟攝影機
23: cap = cv2.VideoCapture(0)
24: cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
25: cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
26: 
27: while True:
28:     ret, frame = cap.read()
29:     if not ret:
30:         break
31: 
32:     # 使用兩個模型推論
33:     results_coco = model_coco(frame, conf=0.3)
34:     results_custom = model_custom(frame, conf=0.1)
35: 
36:     # 開啟 PIL 畫布
37:     image_pil = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
38:     draw = ImageDraw.Draw(image_pil)
39: 
40:     # 畫 COCO 模型結果（藍框）
41:     for box in results_coco[0].boxes:
42:         cls_id = int(box.cls[0])
43:         conf = float(box.conf[0])
44:         label = f"{model_coco.names[cls_id]} ({conf:.2f})"
45:         x1, y1, x2, y2 = map(int, box.xyxy[0])
46:         draw.rectangle([(x1, y1), (x2, y2)], outline=(0, 0, 255), width=2)
47:         draw.text((x1, y1 - 30), label, font=font, fill=(0, 0, 255))
48: 
49:     # 畫自訓模型結果（紅框）
50:     for box in results_custom[0].boxes:
51:         cls_id = int(box.cls[0])
52:         conf = float(box.conf[0])
53:         label = f"{model_custom.names[cls_id]} ({conf:.2f})"
54:         x1, y1, x2, y2 = map(int, box.xyxy[0])
55:         draw.rectangle([(x1, y1), (x2, y2)], outline=(255, 0, 0), width=3)
56:         draw.text((x1, y1 - 30), label, font=font, fill=(255, 0, 0))
57: 
58:     # 顯示影像
59:     annotated = cv2.cvtColor(np.array(image_pil), cv2.COLOR_RGB2BGR)
60:     cv2.imshow("YOLOv8 COCO + 自訓模型", annotated)
61: 
62:     key = cv2.waitKey(1) & 0xFF
63:     if key == ord('q'):
64:         break
65:     elif key == ord('s'):
66:         cv2.imwrite("snapshot.jpg", frame)
67:         print("📸 已儲存畫面 snapshot.jpg")
68: 
69: cap.release()
70: cv2.destroyAllWindows()
71:

4.4.1. 執行結果

2025-05-01_14-25-13 (1).gif

Figure 6: YOLO物件偵測+自訂物件辨識

4.5. 進階(辨識成功後加入特定動作)

在Python中播放聲音:

1: import subprocess
2: subprocess.run(["afplay", "/System/Library/Sounds/Glass.aiff"])

將播放聲音的程式碼加入到上面的程式碼中:

 1: import threading
 2: import subprocess
 3: import cv2
 4: from ultralytics import YOLO
 5: 
 6: model = YOLO("yolov8n.pt")
 7: 
 8: target_class = "person"
 9: target_class_id = list(model.names.values()).index(target_class)
10: 
11: # 啟用內建攝影機
12: cap = cv2.VideoCapture(0)
13: 
14: def beep():
15:     subprocess.run(["afplay", "/System/Library/Sounds/Glass.aiff"])
16: 
17: while True:
18:     ret, frame = cap.read()
19:     if not ret:
20:         break
21: 
22:     results = model(frame)
23:     annotated = results[0].plot()
24: 
25:     class_ids = results[0].boxes.cls.cpu().numpy().astype(int) if results[0].boxes.cls is not None else []
26: 
27:     # 偵測到目標類別就嗶一聲
28:     if target_class_id in class_ids:
29:         threading.Thread(target=beep).start()
30: 
31:     cv2.imshow("YOLOv8 Real-Time Detection", annotated)
32:     if cv2.waitKey(1) & 0xFF == ord('q'):
33:         break
34: 
35: cap.release()
36: cv2.destroyAllWindows()
37:

5. 課堂任務

基本要求
1. 可以實時識別所有組員, 並顯示組員姓名
2. 可以實時識別至少三種物品(其中應包括學生證)
進階要求
1. 當偵測到滑鼠時，播放一段貓叫聲(前5秒)，播放聲音時webcam畫面會停頓嗎？請自行Google關鍵字threading或ChatGPT
2. 當偵測到學生證時，控制webcam拍攝一張照片，儲存於桌面

Footnotes:

YOLO 的歷史進程！YOLO 大補帖！

YOLOv8 介紹與手把手訓練自訂義模型