# 三维卷积：全景图像Spherical CNNs（Code）

卷积神经网络（CNN）可以很好的处理二维平面图像的问题。然而，对球面图像进行处理需求日益增加。例如，对无人机、机器人、自动驾驶汽车、分子回归问题、全球天气和气候模型的全方位视觉处理问题。

将球形信号的平面投影作为卷积神经网络的输入的这种Too Naive做法是注定要失败的，Cnns的巨大成就来源于局部感受野的权值共享，而多层结构总能找到不同rect的相同目标，给出响应。而对于球形图像，一个目标在图片的不同位置是发生形变的，若要使用CNNs直接共享，构建的局部感受野理应描述这种转换。如下图所示，而这种平面投影引起的空间扭曲会导致CNN无法共享权重。 We propose a definition for the spherical cross-correlation that is both expressive and rotation-equivariant. The spherical correlation satisfies a generalized Fourier theorem, which allows us to compute it efficiently using a generalized(non-commutative) Fast Fourier Transform (FFT) algorithm. We demonstrate the computational efficiency, numerical accuracy, and effectiveness of spherical CNNs applied to 3D model recognition and atomization energy regression.

? 如何使三维图像由二维图像重构出来，解决在不同位置产生形变问题，经典的FFT方法和李群模型就成为这种桥梁。

? 关于SO3 作为刚体变换的阐述，参考：半闲居士视觉SLAM十四讲笔记(3)三维空间刚体运动 - par..._CSDN博客 。

? wocao，这个大纲写的更简洁明了：高翔《视觉SLAM十四讲》从理论到实践。

? ? 区分出三维图像和平面的细微差别，把球面图像看做是三维流形，把球面展开为离散的三维李群，把SO(3)的关系用CNNs的高层进行表示。

? ? As shown in Figure 1, there is no good way to use translational convolution or cross-correlation1 to analyze spherical signals. The most obvious approach, then, is to change the definition of crosscorrelation by replacing filter translations by rotations. Doing so, we run into a subtle but important difference between the plane and the sphere: whereas the space of moves for the plane (2D translations) is itself isomorphic to the plane, the space of moves? for the sphere (3D rotations) is a different, three-dimensional manifold called SO(3)2. It follows that the result of a spherical correlation (the output feature map) is to be considered a signal on SO(3), not a signal on the sphere, S2. For this reason, we deploy SO(3) group correlation in the higher layers of a spherical CNN (Cohen and Welling, 2016).

? ? The implementation of a spherical CNN (S2-CNN) involves two major challenges. Whereas a square grid of pixels has discrete translation symmetries, no perfectly symmetrical grids for the sphere exist. This means that there is no simple way to define the rotation of a spherical filter by one pixel. Instead, in order to rotate a filter we would need to perform some kind of interpolation. The other challenge is computational efficiency; SO(3) is a three-dimensional manifold, so a naive implementation of SO(3) correlation is O(n6).