# 简介

cvpr18，目前top榜第一。有代码。

# 引言

• 我们提出了一种名为AffinityNet的新型DNN，它可以预测像素级的高级语义相似度，但仅使用图像级类标签进行训练。

• 与大多数以前的弱监督方法不同，我们的方法不依赖于现成的方法，并通过AffinityNet的端到端训练利用表示学习。

• 在PASCAL VOC 2012 [8]中，我们在同等级别的监督下训练的模型中实现了最先进的性能，并且与依赖于更强监督或外部数据的模型相比具有竞争力。 令人惊讶的是，它甚至优于FCN [22]，这是早期众所周知的全监督模型。

# 我们的框架

## 计算CAM

CAM在我们的框架中发挥着重要作用。 与许多其他弱监督方法一样，它们被视为分割种子，它通常突出显示对象的局部显着部分，然后传播以覆盖整个对象区域。 此外，在我们的框架中，他们被用作训练AffinityNet的监督来源。

## 学习AffinityNet

AffinityNet旨在预测训练图像上一对相邻坐标之间的类不可知语义关联。 预测的相似度在随机游走中用作转移概率，使得随机游走将CAM的激活得分传播到同一语义实体的附近区域，这显着提高了CAM的质量。

AffinityNet

# References

[1] A. Bearman, O. Russakovsky, V. Ferrari, and L. Fei-Fei. What’s the Point: Semantic Segmentation with Point Supervision. In Proceedings of the European Conference on Computer Vision (ECCV), pages 549–565, 2016. 2, 7
[2] G. Bertasius, L. Torresani, S. X. Yu, and J. Shi. Convolutional random walk networks for semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. 1, 3
[3] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Semantic image segmentation with deep convolutional nets and fully connected CRFs. In Proceedings of the International Conference on Learning Representations (ICLR), 2015. 1, 7
[4] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), PP(99):1–1, 2017. 1, 6, 8
[5] Y. Cheng, R. Cai, Z. Li, X. Zhao, and K. Huang. Localitysensitive deconvolution networks with gated fusion for rgb-d indoor semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. 3
[6] J. Dai, K. He, and J. Sun. BoxSup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 1635–1643, 2015. 1, 2, 3, 7
[7] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei. ImageNet: a large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009. 1, 7
[8] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision (IJCV), 88(2):303–338, 2010. 2, 6
[9] B. Hariharan, P. Arbelaez, L. Bourdev, S. Maji, and J. Malik. ´ Semantic contours from inverse detectors. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 991–998, 2011. 7
[10] S. Hong, J. Oh, B. Han, and H. Lee. Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3204 – 3212, 2016. 7
[11] S. Hong, D. Yeo, S. Kwak, H. Lee, and B. Han. Weakly supervised semantic segmentation using web-crawled videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7322–7330, 2017. 1, 3, 7, 8
[12] A. Khoreva, R. Benenson, J. Hosang, M. Hein, and B. Schiele. Simple does it: Weakly supervised instance and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 876–885, 2017. 1, 2, 3, 7
[13] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), 2015. 7
[14] A. Kolesnikov and C. H. Lampert. Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 695–711, 2016. 1, 3, 7
[15] P. Krahenb ¨ uhl and V. Koltun. Efficient inference in fully ¨ connected crfs with gaussian edge potentials. In Proceedings of the Neural Information Processing Systems (NIPS), pages 109–117. 2011. 3
[16] A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of the Neural Information Processing Systems (NIPS), 2012. 7
[17] S. Kwak, S. Hong, and B. Han. Weakly supervised semantic segmentation using superpixel pooling network. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 4111–4117, 2017. 1, 3, 6, 7
[18] D. Lin, J. Dai, J. Jia, K. He, and J. Sun. Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3159–3167, 2016. 1, 2, 7
[19] G. Lin, C. Shen, A. van dan Hengel, and I. Reid. Efficient piecewise training of deep structured models for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3194 – 3203, 2016. 1
[20] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick. Microsoft COCO: com- ´ mon objects in context. In Proceedings of the European Conference on Computer Vision (ECCV), pages 740–755, 2014. 7, 8
[21] T. Liu, J. Sun, N. N. Zheng, X. Tang, and H. Y. Shum. Learning to detect a salient object. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–8, June 2007. 7, 8
[22] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3431 – 3440, 2015. 1, 2, 7, 8
[23] L. Lovsz. Random walks on graphs: A survey, 1993. 2
[24] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th Int’l Conf. Computer Vision, volume 2, pages 416–423, July 2001. 7
[25] H. Noh, S. Hong, and B. Han. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 1520 – 1528, 2015. 1
[26] S. J. Oh, R. Benenson, A. Khoreva, Z. Akata, M. Fritz, and B. Schiele. Exploiting saliency for object segmentation from image level labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4410–4419, 2017. 1, 3, 7
[27] M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2014. 3
[28] G. Papandreou, L.-C. Chen, K. Murphy, and A. L. Yuille. Weakly-and semi-supervised learning of a DCNN for semantic image segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 1742 – 1750, 2015. 1, 2, 3, 7
[29] D. Pathak, P. Krahenb ¨ uhl, and T. Darrell. Constrained con- ¨ volutional neural networks for weakly supervised segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 1742 – 1750, 2015. 1, 3, 7
[30] P. O. Pinheiro and R. Collobert. From image-level to pixellevel labeling with convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1713 – 1721, 2015. 1, 3, 7
[31] A. Prest, C. Leistner, J. Civera, C. Schmid, and V. Ferrari. Learning object class detectors from weakly annotated video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3282 – 3289, 2012. 7, 8
[32] G.-J. Qi. Hierarchically gated deep networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. 1
[33] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (ICLR), 2015. 8
[34] P. Tang, X. Wang, X. Bai, and W. Liu. Multiple instance detection network with online instance classifier refinement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3059–3067, July 2017. 3
[35] P. Tokmakov, K. Alahari, and C. Schmid. Weakly-supervised semantic segmentation using motion cues. In Proceedings of the European Conference on Computer Vision (ECCV), pages 388–404, 2016. 1, 3, 7
[36] P. Vernaza and M. Chandraker. Learning random-walk label propagation for weakly-supervised semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. 1, 2, 3, 7
[37] Y. Wei, J. Feng, X. Liang, M.-M. Cheng, Y. Zhao, and S. Yan. Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 1, 3, 7, 8
[38] Z. Wu, C. Shen, and A. van den Hengel. Wider or deeper: Revisiting the resnet model for visual recognition. arXiv preprint arXiv:1611.10080, 2016. 6, 7, 8, 11
[39] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. Torr. Conditional random fields as recurrent neural networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 1529 – 1537, 2015. 1
[40] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2921 – 2929, 2016. 1, 2, 3