科探空谷
  • Home
  • zhimind home
  • Categories
  • Tags
  • Archives
  • 留学
    • 学校库
    • 专业库
    • 研究方向与招生
    • 工具
    • GPA计算器
    • 脑洞背单词
    • 脱口而出

从物体移动学习特征

目录

  • 摘要
  • 导言
  • 相关工作
  • 评估特征表示
  • 用聚合学习来学习特征
    • 用CNN去分割图像
    • 实验
  • 观察物体移动来学习
    • 无监督移动分割
  • 结
目录

摘要¶

论文官网 用低级别(low-level)的移动聚合线索(motion-based grouping cues)学习视觉表示.

特针对: 视频的无监督移动图像分割, 使用CNN方法。

导言¶

静态物体过度分割oversegment static objects:

有一种Gestalt原则:一起移动的像素很可能有相同的归属。静态场景处理能力会慢慢提升,意味着,动态聚合的能力出现较早,而静态聚合较晚,静态聚合很可能是由动态线索引导出来的。

Gestalt principle of common fate [34, 47]: pixels that move together tend to belong together. The ability to parse static scenes improves [32] over time, suggesting that while motion-based grouping appears early, static grouping is acquired later, possibly bootstrapped by motion cues.

所以,要弄一种 无监督的动态分割方法 unsupervised motion segmentation

动态分割就需要基于视觉流 optical flow

相关工作¶

个人对 Learning from motion and action 感兴趣

Wang and Gupta2 train a ConvNet to distinguish between pairs of tracked patches in a single video, and pairs of patches from different videos

Li et al1 use motion boundary detection to bootstrap a ConvNet-based contour detector, but find that this does not lead to good feature representations

评估特征表示¶

feature representations

略

用聚合学习来学习特征¶

通过学习聚合(group)来学习特征.

core intuition behind this paper is that training a ConvNet to group pixels in static images into objects without any class labels will cause it to learn a strong , high-level feature representation

用CNN去分割图像¶

CNN to 分割物体, 是物体的像素点置1, 不是的置0.

从完整图中剪一个物体

prevent cheating ConvNet by given a precise bounding box: jitter the box in position and scale

take input $w \times w$ image to output $s \times s$ mask.

实验¶

AlexNet, $s=56, w=227$

观察物体移动来学习¶

无监督移动分割¶

The key idea behind motion segmentation is that if there is a single object moving with respect to the background through the entire video, then pixels on the object will move differently from pixels on the background.

复杂物体单帧可能只移动部分, 所以需要多帧进行聚合

e NLC approach from Faktor and Irani [12], utilizes an edge detector that was trained on labeled edge images 。 replace the trained edge detector in NLC with unsupervised superpixels

  1. uNLC computes a per-frame saliency map based on motion, by looking for
    1. either pixels that move in a mostly static frame
    2. or, if the frame contains significant motion, pixels that move in a direction different from the dominant one
  2. per-pixel saliency is averaged over superpixels.
  3. nearest neighbor graph is computed over the superpixels in the video using location and appearance as features.
  4. use a nearest neighbor voting scheme to propagate the saliency across frames

结¶

真看不明白~~~没有基础知识,想看算法没有,有代码,但代码多复杂啊~~~


  1. Y. Li, M. Paluri, J. M. Rehg, and P. Dollar. Unsupervised learning of edges. CVPR, 2016 ↩

  2. X. Wang and A. Gupta. Unsupervised learning of visual representations using videos ↩


Published

5月 14, 2017

Last Updated

5月 15, 2017

Category

计算机视觉

Tags

  • 计算机视觉 5
  • 论文 39
  • 人工智能 28

Stay in Touch

  • Powered by Pelican. Theme: Elegant by Talha Mansoor