首页 > 生活讲堂

(HEMOHEM)hardnegative(example)mining难例挖掘与foca。。。_百度文

更新时间:2023-11-03 20:31:57 阅读：评论：0

生态校园-gre分数要求

2023年11月3日发(作者：面试复试)

（HEMOHEM）hardnegative（example）mining难例挖掘与foca。。。

⽬录

分类任务中的样本不均衡及hard negative mining的必要性

在训练⼀个分类器的时候，对数据的要求是class balance，即不同标签的样本量都要充⾜且相仿。然⽽，这个要求在现实应⽤中往往很难

得到保证。

在⽬标检测算法中，对于输⼊的⼀张图像，可能会⽣成成千上万的预选框（region proposal），但是其中只有很少⼀部分是包含真实⽬标

的，这就带来了类别不均衡问题。

类别不平衡时，⽆⽤的易分反例样本（easy negative sample）会使得模型的整体学习⽅向跑偏，导致⽆效学习，即只能分辨出没有物体

的背景，⽽⽆法分辨具体的物体。（因为在使⽤cross-entropy loss做mini-batch SGD时，是⼤量的样本产⽣的loss average之后计算

gradient以及实施参数update。这个average的操作是有问题的，因为⼀个batch⾥⾯easy sample占绝⼤多数，hard sample只占很少部

分，如果没有re-weighting或者importance balancing，那么在average之后，hard sample的contribution完全就被easy samples侵蚀

平均抹掉了。事实上，往往这些hard samples，才是分类器性能进⼀步提升的bottleneck（hard sample：很少出现但是现实存在的那些

极端样本，⽐如车辆跟踪中出过事故的车辆。））

hard negative example

那么负样本中哪些是困难负样本(hard negative)呢？困难负样本是指哪些容易被⽹络预测为正样本的proposal，即假阳性(fal

positive)，

分类任务：虽然是负样本，但其预测为正的概率较⾼（如果p=0.5则判断为正样本，那么p=0.49就属于hard negative）

检测任务：如roi⾥有⼆分之⼀个⽬标时，虽然它仍是负样本，却容易被判断为正样本，这块roi即为hard negative；

度量学习：与anchor（正样本）距离较近的负样本就是hard negative

训练hard negative对提升⽹络的分类性能具有极⼤帮助，因为它相当于⼀个错题集。

如何判断它为困难负样本呢？也很简单，我们先⽤初始样本集(即第⼀帧随机选择的正负样本)去训练⽹络，再⽤训练好的⽹络去预测负样本

分类loss：

制定规则去选取hard negative: DenBox

核⼼思想：选取与label差别⼤(分类loss)的作为hard negtive

根据制定的规则选取了hard negative ,在训练的时候加强对hard negative的训练。

In the forward propagation pha, we sort the loss of output pixels in decending order, and assign the top 1% to be

hard-negative. In all experiments, we keep all positive labeled pixels(samples) and the ratio of positive and negative to be

1:1. Among all negative samples, half of them are sampled from hard-negative samples, and the remaining half are

lected randomly from non-hard negative.

ROI loss

⼀个只读的RoI⽹络对特征图和所有RoI进⾏前向传播，然后Hard RoI module利⽤这些RoI的loss选择B个样本。这些选择出的样本

（hard examples）进⼊RoI⽹络，进⼀步进⾏前向和后向传播。

focal loss

先前也有⼀些算法来处理类别不均衡的问题，⽐如OHEM（online hard example mining），OHEM的主要思想可以⽤原⽂的⼀句话概

括：In OHEM each example is scored by its loss, non-maximum suppression (nms) is then applied, and a minibatch is

constructed with the highest-loss examples。OHEM算法虽然增加了错分类（正、负）样本的权重，但是OHEM算法忽略了容易分类

的（正）样本。

因此针对类别不均衡问题，作者提出⼀种新的损失函数：focal loss。

这个损失函数是在标准交叉熵损失基础上修改得到的。这个函数可以通过减少易分类样本的权重，使得模型在训练时更专注于难分类的样

本。

我理解的focal loss 是，利⽤⼀个re-weighting factor来modulating（re-weighting）每⼀个样本的importance，得到⼀个cost

nsitve的classifier。

⼆分类focal loss

local loss的具体实现在此以⼆分类cross entropy loss举例：

⽆focal：

L(p,y)=−(y⋅log(p)+(1−y)⋅log(1−p))

有focal（hard negative mining，加⼤难的负样本权重）：

L(p,y)=−(y⋅(1−p)⋅log(p)+(1−y)⋅p⋅log(1−p))

上式中p是classifer输出的[0,1]之间实数值，为预测概率，y是⾮0即1的label。将⼀个⼀维实数p进⾏⼆分类化的操作是令p指⽰label = 1

为正样本的概率，另1-p指⽰label = 0为负样本的概率。focal loss的核⼼就是直接⽤p作为modulating（re-weighting factor），当⼀个

负样本很难时，p略⼩于0.5；easy negative则是p远⼩于0.5。所以hard negative mining就体现在给1-p越⼩的negative（hard

negative）乘以⼀个越⼤的factor（p）。

更⼀般化的表⽰：

⼆分类任务的交叉熵损失函数公式如下：

令pt代表如下意义

则⼆分类focal loss为

在此基础上可以进⼀步引进另⼀个调整权重的超参数a，

ps：在每⼀次预测时，真实标签的one-hot向量中，只有⼀项为1，其它项都为0，所以其focal loss只需计算为1的那⼀项。

def binary_focal_loss(y_true, y_pred,gamma=2.0, alpha=0.25):

# Define epsilon so that the backpropagation will not result in NaN

# for 0 divisor ca

epsilon = K.epsilon()

# Add the epsilon to prediction value

#y_pred = y_pred + epsilon

# Clip the prediciton value

y_pred = K.clip(y_pred, epsilon, 1.0-epsilon)

# Calculate p_t

p_t = tf.where(K.equal(y_true, 1), y_pred, 1-y_pred)

# Calculate alpha_t

alpha_factor = K.ones_like(y_true)*alpha

alpha_t = tf.where(K.equal(y_true, 1), alpha_factor, 1-alpha_factor)

# Calculate cross entropy

cross_entropy = -K.log(p_t)

weight = alpha_t * K.pow((1-p_t), gamma)

# Calculate focal loss

loss = weight * cross_entropy

# Sum the loss in mini_batch

loss = K.sum(loss, axis=1)

return loss

多分类focal loss

对于多分类：

⽆focal：

H(p,q)=−∑pxlogqx

i=1

()(())

在机器学习中，将ground truth当作⼀个分布（P），将预测作为另⼀个分布（q），假设有cnum个类别(三分类问题cnum=3, 四分类问

题cnum=4)，那么就有：

H(p,q)=−∑pclogqc

i=1

()()

假设有⼀个三分类问题，某个样例的正确答案是（1,0,0）。某模型经过Softmax回归之后的预测答案是（0.5,0,4,0.1)，那么这个预

测和正确答案直接的交叉熵是：

cnum

H((1,0,0),(0.5,0.4,0.1))=−(1×log0.5+0×log0.4+0×log0.1)≈0.3

有focal（hard negative mining，加⼤难的负样本权重）：

在多分类任务中，

上述公式中的Pt⽤如下向量代替（其中乘法是向量内各元素的乘法，输出的pt是⼀个向量）

# -*- coding: utf-8 -*-

import tensorflow as tf

"""

Tensorflow实现何凯明的Focal Loss, 该损失函数主要⽤于解决分类问题中的类别不平衡

focal_loss_sigmoid: ⼆分类loss

focal_loss_softmax: 多分类loss

Reference Paper : Focal Loss for Den Object Detection

"""

def focal_loss_sigmoid(labels,logits,alpha=0.25,gamma=2):

"""

Computer focal loss for binary classification

Args:

labels: A int32 tensor of shape [batch_size].

logits: A float32 tensor of shape [batch_size].

alpha: A scalar for focal loss alpha hyper-parameter. If positive samples number

> negtive samples number, alpha < 0.5 and vice versa.

gamma: A scalar for focal loss gamma hyper-parameter.

Returns:

A tensor of the same shape as `lables`

"""

y_pred=tf.nn.sigmoid(logits)

labels=tf.to_float(labels)

L=-labels*(1-alpha)*((1-y_pred)*gamma)*tf.log(y_pred)-

(1-labels)*alpha*(y_pred**gamma)*tf.log(1-y_pred)

return L

def focal_loss_softmax(labels,logits,gamma=2):

"""

Computer focal loss for multi classification

Args:

labels: A int32 tensor of shape [batch_size].

logits: A float32 tensor of shape [batch_size,num_class].

gamma: A scalar for focal loss gamma hyper-parameter.

Returns:

A tensor of the same shape as `lables`

"""

y_pred=tf.nn.softmax(logits,dim=-1) # [batch_size,num_class]

# To avoid divided by zero

epsilon = 1.e-7

y_pred += tf.epsilon() #

⼀个很⼩的数值

labels=tf.one_hot(labels,depth=y_pred.shape[1])

L=-labels*((1-y_pred)**gamma)*tf.log(y_pred) #0

输出⼀个向量，向量中只有⼀项不为

#0reduce_max()

将向量变为标量，由于向量中只有⼀项不为，所以也可⽤

L=tf.reduce_sum(L,axis=1) #

将向量变为标量。

return L

if __name__ == '__main__':

logits=tf.random_uniform(shape=[5],minval=-1,maxval=1,dtype=tf.float32)

labels=tf.Variable([0,1,0,0,1])

loss1=focal_loss_sigmoid(labels=labels,logits=logits)

logits2=tf.random_uniform(shape=[5,4],minval=-1,maxval=1,dtype=tf.float32)

labels2=tf.Variable([1,0,2,3,1])

loss2=focal_loss_softmax(labels==labels2,logits=logits2)

with tf.Session() as ss:

ss.run(tf.global_variables_initializer())

print ss.run(loss1)

print ss.run(loss2)

GHM

参数target, label_weight的关系

官⽅代码中⽤于分类问题的GHMC损失函数的部分代码如下：

def forward(lf, pred, target, label_weight, *args, **kwargs):

"""Calculate the GHM-C loss.

Args:

pred (float tensor of size [batch_num, class_num]):

The direct prediction of classification fc layer.

target (float tensor of size [batch_num, class_num]):

Binary class target for each sample.

label_weight (float tensor of size [batch_num, class_num]):

the value is 1 if the sample is valid and 0 if ignored.

Returns:

The gradient harmonized loss.

"""

# the target should be binary class label

if pred.dim() != target.dim():

target, label_weight = _expand_binary_labels(

target, label_weight, pred.size(-1))

target, label_weight = target.float(), label_weight.float()

edges = lf.edges

mmt = lf.momentum

weights = torch.zeros_like(pred)

需要注意的是，参数target要求是one-hot编码形式，如果不是one-hot形式，则要通过_expand_binary_labels扩展成one-hot形式。

⽽参数label_weight则表⽰该标签是否要进⾏GHM操作，默认都是全1。⽽且，label_weight的维度必须与target保持⼀致，即如果target

采⽤one-hot（形如 [batch_num, class_num]），则label_weight的size也是 [batch_num, class_num]，如果target的size是

[batch_num]，那么label_weight的size也必须是[batch_num]。

⾮one-hot情况下，labels是从0还是从1开始编码

官⽅代码中，_expand_binary_labels定义如下

def _expand_binary_labels(labels, label_weights, label_channels):

bin_labels = labels.new_full((labels.size(0), label_channels), 0)

inds = torch.nonzero(labels >= 1).squeeze()

if inds.numel() > 0:

bin_labels[inds, labels[inds] - 1] = 1

bin_label_weights = label_weights.view(-1, 1).expand(

label_weights.size(0), label_channels)

return bin_labels, bin_label_weights

官⽅代码这种写法，认为labels（类别）的编码是从1开始的，不是从0开始的。

但很多情况下，我们输⼊的labels(类别)编码是从0开始的。因此需要对代码进⾏修改，如下所⽰：

def _expand_binary_labels(lf, labels, label_weights, label_channels):

# expand labels

bin_labels = labels.new_full((labels.size(0), label_channels), 0)

# inds = o(labels >= 1).squeeze() #labels1

假设是从开始编号

inds = torch.nonzero(labels >= 0).squeeze() #labels0

加谁是从开始编号

if inds.numel() > 0:

# bin_labels[inds, labels[inds] - 1] = 1 #labels1

假设是从开始编号

bin_labels[inds, labels[inds]] = 1 #labels0

假设是从开始编号

# expand label_weights(label_weight should with size [batch_num], otherwi the function "expand" cannot work)

bin_label_weights = label_weights.view(-1, 1).expand(

label_weights.size(0), label_channels)

return bin_labels, bin_label_weights

难例挖掘的相关领域：长尾分布下的分类

解读：

更简单的理解就是，以前我们算training loss的时候，选取⼀个mini batch。现在，为了给每个batch内的样本重新分配权重，使⽤

⼀个valid t中的mini batch计算validation loss, 根据validation loss计算权重。training loss 更新的是模型参数，⽽validation

loss更新的是权重，也就是超参数。这就是元学习。

如果训练样本分布和验证样本分布相似，它们的梯度⽅向也接近，那么这样的样本⽐较好，需要增加权重，反之则需要降低权重。这

样的样本权重分配⽅式能够使得模型变得⽆偏

售后服务的重要性-拈轻怕重的反义词

本文发布于:2023-11-03 20:31:57，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/zhishi/a/1699014717227639.html

本文word下载地址：(HEMOHEM)hardnegative(example)mining难例挖掘与foca。。。_百度文.doc

本文 PDF 下载地址：(HEMOHEM)hardnegative(example)mining难例挖掘与foca。。。_百度文.pdf

上一篇：初中物理哪些部分知识比较难

下一篇：返回列表

标签：学习哪有那么难

留言与评论（共有 0 条评论）