导入FaceNet的一些坑

文章目录[隐藏]

简介
正文

简介

FacenNet是谷歌提出的一种新的人脸识别的方法，该方法在LFW数据集上的准确度已经达到了99.65%。

FaceNet论文

FaceNet实现

正文

上个周末，Milo在FaceNet的导入上踩了整整两天的坑，包括开发环境，源代码报错，运行异常等等。希望本文的粗糙填坑可以帮助到更多和Milo一样的人工智障爱好者和初学者。

开发环境（显卡：GeForce RTX 2060 6G）

	方案一	方案二	方案三
tensorflow-gpu	1.13.1	1.7.0	1.7.0
cudatoolkit	10.0.130	9.0	8.0
cudnn	7.3.1	7.1.2	7.1.3
python	3.7.3	3.6.8	3.6.8
结果	compare.py：成功 validate_on_lfw.py：失败	compare.py：失败 validate_on_lfw.py：成功	无法调用GPU CPU运行两个文件均成功

代码需要修改的地方：

但凡需要用到GPU的地方都加上：

# session中
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=args.gpu_memory_fraction)
    sess = tf.Session(
                config=tf.ConfigProto(gpu_options=gpu_options, log_device_placement=False))

# parse_arguments(argv)中
parser.add_argument('--gpu_memory_fraction', type=float,
                       help='Upper bound on the amount of GPU memory that will be used by the process.', default=0.5)

facenet.py中：

# create_input_pipeline(input_queue, image_size, nrof_preprocess_threads, batch_size_placeholder)中，所有代码都缩进到scope内。
with tf.name_scope('tempscope'):

detect_face.py中：

# 87行附近，添加allow_pickle=True。
data_dict = np.load(data_path, allow_pickle=True, encoding='latin1').item() # pylint: disable=no-member

坑一：

Attempting to use uninitialized value InceptionResnetV1/Repeat_1/block17_4/Branch_1/Conv2d_0b_1x7/weights

解决方案：启动参数设置成具体的模型.pb文件。如果设置成文件夹，程序会使用ckpt和meta文件，于是就会出现各种Attempting to use uninitialized value的情况。

坑二：

Loaded runtime CuDNN library: 7600 (compatibility version 7600) but source was compiled with 7102 (compatibility version 7100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.

解决方案：将cudnn版本降至7.1.2。Milo在Anaconda环境中尝试了各种版本，其中7.1.3和7.0.5不会报这个错，但是7.1.3会自动安装CUDA8.0版本，7.0.5又会出现其他异常，所以在Milo的电脑上只能选择7.1.2版本。

坑三：

Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA

解决方案：下载官方Tensorflow源文件进行重新编译和安装。(在此，Milo未做处理，因为这是个Warning，当时只做了方案查询，还有大坑需要填。)

坑四：

freeze_graph.py用途

总结：在长时间无法解决>>坑二<<的情况下，总感觉应该寻找载入参数的办法。这和常规Python开发真的有很大不同，正常情况下在命令行自定义参数即可，可这是神经网络的参数，如何将模型中的参数载入呢？山穷水复疑无路，柳暗花明又一村！Milo搜索了freeze_graph.py的用途，发现训练好的模型正是通过此文件将参数整合到模型中，最终生成.pb文件。所以，>>坑二<<顺其自然地解决了！

坑五：

could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

实际情况一：当tensorflow-gpu版本>1.8时就会出现这个异常。

实际情况二：看到CUDNN_OOXX_OOXX时，在代码中设置session的gpu_options可以解决部分问题。

坑六：

InternalError (see above for traceback): Blas SGEMM launch failed

总结：要么显卡太新，要么CUDNN版本问题。

本博客文章为原创内容，版权归作者所有。未经作者书面许可，不得擅自转载。如需转载，请注明原文链接和作者信息。违者将追究法律责任。

导入FaceNet的一些坑

简介

正文

猜你喜欢

用CNN实现Digit Reconizer总结

为什么梯度方向是函数局部上升最快的方向

发表回复取消回复

简介

正文

猜你喜欢

用CNN实现Digit Reconizer总结

为什么梯度方向是函数局部上升最快的方向

发表回复 取消回复

发表回复取消回复