HOME
  Security
   Software
    Hardware
  
FPGA
  CPU
   Android
    Raspberry Pi
  
nLite
  Xcode
   etc.
    ALL
  
LINK
BACK
 

2018/08/22

Raspberry Piで NNPACKをビルドする方法 Raspberry Piで NNPACKをビルドする方法

(ラズパイで NNPACKをビルドしてみるテスト、ビルドするだけ)

Tags: [Raspberry Pi], [電子工作], [ディープラーニング]





● Raspberry Piで NNPACKをビルドする方法

 Raspberry Piで NNPACKをビルドする方法

Maratyszcza/NNPACK
 Acceleration package for neural networks on multi-core CPUs
 NNPACK is an acceleration package for neural network computations. NNPACK aims to provide high-performance implementations of convnet layers for multi-core CPUs.

 ビルドして NNPACKの ninja testの動作試験でラズパイを長時間フル稼働させて自己満足に浸ります。


●今回動かした Raspberry Pi Raspbian OSのバージョン

 RASPBIAN STRETCH WITH DESKTOP
 Version:June 2018
 Release date: 2018-06-27
 Kernel version: 4.14
pi@raspberrypi:~/pytorch $ uname -a
Linux raspberrypi 4.14.50-v7+ #1122 SMP Tue Jun 19 12:26:26 BST 2018 armv7l GNU/Linux


● Raspberry Piで NNPACKを Gitのソースコードからビルドしてみる

# お決まりの sudo apt-get updateで最新状態に更新する
sudo apt-get update

# Development builds

# Install ninja build system
sudo apt-get -y install ninja-build

# Install PeachPy assembler and confu configuration system
sudo pip install --upgrade git+https://github.com/Maratyszcza/PeachPy
sudo pip install --upgrade git+https://github.com/Maratyszcza/confu

# Then clone NNPACK, install dependencies, configure, and build
cd
git clone https://github.com/Maratyszcza/NNPACK.git
cd NNPACK
confu setup
python configure.py

# 3コアでビルドで時間短縮
ninja -j3

# ninja smoketest(ninja smoketestは 30秒で終わる)
ninja smoketest
# [  PASSED  ] 4 tests.

# ninja test(ninja testは物凄い時間が掛かる。10時間)
ninja test


● NNPACK ninja smoketest

 ninja smoketestは 3Bで 30秒で完了します。3B+は 24秒。
 下記のテスト結果はラズパイ3Bです。使用電源は 2.4A出力対応の中華製。
 USB電源ケーブルは Syncwire製の 20cm Syncwire UNBREAKcable 超高耐久 2.4A USBケーブル。
pi@raspberrypi:~/NNPACK $ time ninja smoketest
[0/12] RUN fourier-test
[==========] Running 16 tests from 16 test cases.
[----------] Global test environment set-up.
[----------] 1 test from FFT8_WITHIN_ROWS
[ RUN      ] FFT8_WITHIN_ROWS.match_reference
[       OK ] FFT8_WITHIN_ROWS.match_reference (5 ms)
[----------] 1 test from FFT8_WITHIN_ROWS (5 ms total)

[----------] 1 test from FFT16_WITHIN_ROWS
[ RUN      ] FFT16_WITHIN_ROWS.match_reference
[       OK ] FFT16_WITHIN_ROWS.match_reference (10 ms)
[----------] 1 test from FFT16_WITHIN_ROWS (11 ms total)

[----------] 1 test from IFFT8_WITHIN_ROWS
[ RUN      ] IFFT8_WITHIN_ROWS.match_reference
[       OK ] IFFT8_WITHIN_ROWS.match_reference (4 ms)
[----------] 1 test from IFFT8_WITHIN_ROWS (4 ms total)

[----------] 1 test from IFFT16_WITHIN_ROWS
[ RUN      ] IFFT16_WITHIN_ROWS.match_reference
[       OK ] IFFT16_WITHIN_ROWS.match_reference (10 ms)
[----------] 1 test from IFFT16_WITHIN_ROWS (10 ms total)

[----------] 1 test from FFT8_DUAL_REAL_WITHIN_ROWS
[ RUN      ] FFT8_DUAL_REAL_WITHIN_ROWS.match_reference
[       OK ] FFT8_DUAL_REAL_WITHIN_ROWS.match_reference (5 ms)
[----------] 1 test from FFT8_DUAL_REAL_WITHIN_ROWS (5 ms total)

[----------] 1 test from FFT16_DUAL_REAL_WITHIN_ROWS
[ RUN      ] FFT16_DUAL_REAL_WITHIN_ROWS.match_reference
[       OK ] FFT16_DUAL_REAL_WITHIN_ROWS.match_reference (10 ms)
[----------] 1 test from FFT16_DUAL_REAL_WITHIN_ROWS (10 ms total)

[----------] 1 test from IFFT8_DUAL_REAL_WITHIN_ROWS
[ RUN      ] IFFT8_DUAL_REAL_WITHIN_ROWS.match_reference
[       OK ] IFFT8_DUAL_REAL_WITHIN_ROWS.match_reference (5 ms)
[----------] 1 test from IFFT8_DUAL_REAL_WITHIN_ROWS (5 ms total)

[----------] 1 test from IFFT16_DUAL_REAL_WITHIN_ROWS
[ RUN      ] IFFT16_DUAL_REAL_WITHIN_ROWS.match_reference
[       OK ] IFFT16_DUAL_REAL_WITHIN_ROWS.match_reference (10 ms)
[----------] 1 test from IFFT16_DUAL_REAL_WITHIN_ROWS (10 ms total)

[----------] 1 test from FFT4_ACROSS_ROWS
[ RUN      ] FFT4_ACROSS_ROWS.match_reference
[       OK ] FFT4_ACROSS_ROWS.match_reference (8 ms)
[----------] 1 test from FFT4_ACROSS_ROWS (8 ms total)

[----------] 1 test from FFT8_ACROSS_ROWS
[ RUN      ] FFT8_ACROSS_ROWS.match_reference
[       OK ] FFT8_ACROSS_ROWS.match_reference (16 ms)
[----------] 1 test from FFT8_ACROSS_ROWS (17 ms total)

[----------] 1 test from IFFT4_ACROSS_ROWS
[ RUN      ] IFFT4_ACROSS_ROWS.match_reference
[       OK ] IFFT4_ACROSS_ROWS.match_reference (4 ms)
[----------] 1 test from IFFT4_ACROSS_ROWS (4 ms total)

[----------] 1 test from IFFT8_ACROSS_ROWS
[ RUN      ] IFFT8_ACROSS_ROWS.match_reference
[       OK ] IFFT8_ACROSS_ROWS.match_reference (10 ms)
[----------] 1 test from IFFT8_ACROSS_ROWS (10 ms total)

[----------] 1 test from FFT8_REAL_ACROSS_ROWS
[ RUN      ] FFT8_REAL_ACROSS_ROWS.match_reference
[       OK ] FFT8_REAL_ACROSS_ROWS.match_reference (5 ms)
[----------] 1 test from FFT8_REAL_ACROSS_ROWS (5 ms total)

[----------] 1 test from FFT16_REAL_ACROSS_ROWS
[ RUN      ] FFT16_REAL_ACROSS_ROWS.match_reference
[       OK ] FFT16_REAL_ACROSS_ROWS.match_reference (10 ms)
[----------] 1 test from FFT16_REAL_ACROSS_ROWS (10 ms total)

[----------] 1 test from IFFT8_REAL_ACROSS_ROWS
[ RUN      ] IFFT8_REAL_ACROSS_ROWS.match_reference
[       OK ] IFFT8_REAL_ACROSS_ROWS.match_reference (5 ms)
[----------] 1 test from IFFT8_REAL_ACROSS_ROWS (5 ms total)

[----------] 1 test from IFFT16_REAL_ACROSS_ROWS
[ RUN      ] IFFT16_REAL_ACROSS_ROWS.match_reference
[       OK ] IFFT16_REAL_ACROSS_ROWS.match_reference (10 ms)
[----------] 1 test from IFFT16_REAL_ACROSS_ROWS (10 ms total)

[----------] Global test environment tear-down
[==========] 16 tests from 16 test cases ran. (132 ms total)
[  PASSED  ] 16 tests.
[1/12] RUN convolution-output-smoketest
[==========] Running 52 tests from 3 test cases.
[----------] Global test environment set-up.
[----------] 18 tests from FT8x8
[ RUN      ] FT8x8.single_tile
[       OK ] FT8x8.single_tile (10 ms)
[ RUN      ] FT8x8.single_tile_with_relu
[       OK ] FT8x8.single_tile_with_relu (2 ms)
[ RUN      ] FT8x8.input_subtile
[       OK ] FT8x8.input_subtile (1 ms)
[ RUN      ] FT8x8.input_subtile_with_relu
[       OK ] FT8x8.input_subtile_with_relu (1 ms)
[ RUN      ] FT8x8.multi_tile
[       OK ] FT8x8.multi_tile (6 ms)
[ RUN      ] FT8x8.multi_tile_with_relu
[       OK ] FT8x8.multi_tile_with_relu (6 ms)
[ RUN      ] FT8x8.implicit_padding
[       OK ] FT8x8.implicit_padding (188 ms)
[ RUN      ] FT8x8.implicit_padding_with_relu
[       OK ] FT8x8.implicit_padding_with_relu (192 ms)
[ RUN      ] FT8x8.small_batch
[       OK ] FT8x8.small_batch (23 ms)
[ RUN      ] FT8x8.small_batch_with_relu
[       OK ] FT8x8.small_batch_with_relu (24 ms)
[ RUN      ] FT8x8.few_input_channels
[       OK ] FT8x8.few_input_channels (21 ms)
[ RUN      ] FT8x8.few_input_channels_with_relu
[       OK ] FT8x8.few_input_channels_with_relu (22 ms)
[ RUN      ] FT8x8.few_output_channels
[       OK ] FT8x8.few_output_channels (20 ms)
[ RUN      ] FT8x8.few_output_channels_with_relu
[       OK ] FT8x8.few_output_channels_with_relu (20 ms)
[ RUN      ] FT8x8.non_square_kernel
[       OK ] FT8x8.non_square_kernel (2 ms)
[ RUN      ] FT8x8.non_square_kernel_with_relu
[       OK ] FT8x8.non_square_kernel_with_relu (2 ms)
[ RUN      ] FT8x8.non_square_image
[       OK ] FT8x8.non_square_image (4 ms)
[ RUN      ] FT8x8.non_square_image_with_relu
[       OK ] FT8x8.non_square_image_with_relu (5 ms)
[----------] 18 tests from FT8x8 (550 ms total)

[----------] 18 tests from FT16x16
[ RUN      ] FT16x16.single_tile
[       OK ] FT16x16.single_tile (14 ms)
[ RUN      ] FT16x16.single_tile_with_relu
[       OK ] FT16x16.single_tile_with_relu (12 ms)
[ RUN      ] FT16x16.input_subtile
[       OK ] FT16x16.input_subtile (5 ms)
[ RUN      ] FT16x16.input_subtile_with_relu
[       OK ] FT16x16.input_subtile_with_relu (5 ms)
[ RUN      ] FT16x16.multi_tile
[       OK ] FT16x16.multi_tile (39 ms)
[ RUN      ] FT16x16.multi_tile_with_relu
[       OK ] FT16x16.multi_tile_with_relu (39 ms)
[ RUN      ] FT16x16.implicit_padding
[       OK ] FT16x16.implicit_padding (847 ms)
[ RUN      ] FT16x16.implicit_padding_with_relu
[       OK ] FT16x16.implicit_padding_with_relu (856 ms)
[ RUN      ] FT16x16.small_batch
[       OK ] FT16x16.small_batch (140 ms)
[ RUN      ] FT16x16.small_batch_with_relu
[       OK ] FT16x16.small_batch_with_relu (143 ms)
[ RUN      ] FT16x16.few_input_channels
[       OK ] FT16x16.few_input_channels (118 ms)
[ RUN      ] FT16x16.few_input_channels_with_relu
[       OK ] FT16x16.few_input_channels_with_relu (119 ms)
[ RUN      ] FT16x16.few_output_channels
[       OK ] FT16x16.few_output_channels (121 ms)
[ RUN      ] FT16x16.few_output_channels_with_relu
[       OK ] FT16x16.few_output_channels_with_relu (123 ms)
[ RUN      ] FT16x16.non_square_kernel
[       OK ] FT16x16.non_square_kernel (11 ms)
[ RUN      ] FT16x16.non_square_kernel_with_relu
[       OK ] FT16x16.non_square_kernel_with_relu (11 ms)
[ RUN      ] FT16x16.non_square_image
[       OK ] FT16x16.non_square_image (24 ms)
[ RUN      ] FT16x16.non_square_image_with_relu
[       OK ] FT16x16.non_square_image_with_relu (23 ms)
[----------] 18 tests from FT16x16 (2650 ms total)

[----------] 16 tests from WT8x8
[ RUN      ] WT8x8.single_tile
[       OK ] WT8x8.single_tile (2 ms)
[ RUN      ] WT8x8.single_tile_with_relu
[       OK ] WT8x8.single_tile_with_relu (2 ms)
[ RUN      ] WT8x8.input_subtile
[       OK ] WT8x8.input_subtile (1 ms)
[ RUN      ] WT8x8.input_subtile_with_relu
[       OK ] WT8x8.input_subtile_with_relu (2 ms)
[ RUN      ] WT8x8.multi_tile
[       OK ] WT8x8.multi_tile (5 ms)
[ RUN      ] WT8x8.multi_tile_with_relu
[       OK ] WT8x8.multi_tile_with_relu (6 ms)
[ RUN      ] WT8x8.implicit_padding
[       OK ] WT8x8.implicit_padding (47 ms)
[ RUN      ] WT8x8.implicit_padding_with_relu
[       OK ] WT8x8.implicit_padding_with_relu (48 ms)
[ RUN      ] WT8x8.small_batch
[       OK ] WT8x8.small_batch (21 ms)
[ RUN      ] WT8x8.small_batch_with_relu
[       OK ] WT8x8.small_batch_with_relu (21 ms)
[ RUN      ] WT8x8.few_input_channels
[       OK ] WT8x8.few_input_channels (20 ms)
[ RUN      ] WT8x8.few_input_channels_with_relu
[       OK ] WT8x8.few_input_channels_with_relu (19 ms)
[ RUN      ] WT8x8.few_output_channels
[       OK ] WT8x8.few_output_channels (18 ms)
[ RUN      ] WT8x8.few_output_channels_with_relu
[       OK ] WT8x8.few_output_channels_with_relu (19 ms)
[ RUN      ] WT8x8.non_square_image
[       OK ] WT8x8.non_square_image (4 ms)
[ RUN      ] WT8x8.non_square_image_with_relu
[       OK ] WT8x8.non_square_image_with_relu (4 ms)
[----------] 16 tests from WT8x8 (239 ms total)

[----------] Global test environment tear-down
[==========] 52 tests from 3 test cases ran. (3439 ms total)
[  PASSED  ] 52 tests.
[2/12] RUN winograd-test
[==========] Running 6 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 3 tests from F6k3
[ RUN      ] F6k3.input
[       OK ] F6k3.input (7 ms)
[ RUN      ] F6k3.kernel
[       OK ] F6k3.kernel (4 ms)
[ RUN      ] F6k3.output
[       OK ] F6k3.output (6 ms)
[----------] 3 tests from F6k3 (18 ms total)

[----------] 3 tests from F6x6_3x3
[ RUN      ] F6x6_3x3.input
[       OK ] F6x6_3x3.input (52 ms)
[ RUN      ] F6x6_3x3.kernel
[       OK ] F6x6_3x3.kernel (12 ms)
[ RUN      ] F6x6_3x3.output
[       OK ] F6x6_3x3.output (29 ms)
[----------] 3 tests from F6x6_3x3 (93 ms total)

[----------] Global test environment tear-down
[==========] 6 tests from 2 test cases ran. (113 ms total)
[  PASSED  ] 6 tests.
[3/12] RUN sgemm-test
Running main() from gtest_main.cc
[==========] Running 9 tests from 3 test cases.
[----------] Global test environment set-up.
[----------] 3 tests from FAST6x8_NEON
[ RUN      ] FAST6x8_NEON.kc1
[       OK ] FAST6x8_NEON.kc1 (16 ms)
[ RUN      ] FAST6x8_NEON.kc2
[       OK ] FAST6x8_NEON.kc2 (15 ms)
[ RUN      ] FAST6x8_NEON.kc10
[       OK ] FAST6x8_NEON.kc10 (35 ms)
[----------] 3 tests from FAST6x8_NEON (66 ms total)

[----------] 3 tests from FAST6x8_AARCH32_NEON
[ RUN      ] FAST6x8_AARCH32_NEON.kc1
[       OK ] FAST6x8_AARCH32_NEON.kc1 (12 ms)
[ RUN      ] FAST6x8_AARCH32_NEON.kc2
[       OK ] FAST6x8_AARCH32_NEON.kc2 (14 ms)
[ RUN      ] FAST6x8_AARCH32_NEON.kc10
[       OK ] FAST6x8_AARCH32_NEON.kc10 (35 ms)
[----------] 3 tests from FAST6x8_AARCH32_NEON (62 ms total)

[----------] 3 tests from FULL6x8_NEON
[ RUN      ] FULL6x8_NEON.kc1
[       OK ] FULL6x8_NEON.kc1 (140 ms)
[ RUN      ] FULL6x8_NEON.kc2
[       OK ] FULL6x8_NEON.kc2 (191 ms)
[ RUN      ] FULL6x8_NEON.kc10
[       OK ] FULL6x8_NEON.kc10 (589 ms)
[----------] 3 tests from FULL6x8_NEON (920 ms total)

[----------] Global test environment tear-down
[==========] 9 tests from 3 test cases ran. (1048 ms total)
[  PASSED  ] 9 tests.
[4/12] RUN sxgemm-test
Running main() from gtest_main.cc
[==========] Running 4 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 3 tests from FAST_S4GEMM_3x3
[ RUN      ] FAST_S4GEMM_3x3.neon
[       OK ] FAST_S4GEMM_3x3.neon (224 ms)
[ RUN      ] FAST_S4GEMM_3x3.aarch32_neon
[       OK ] FAST_S4GEMM_3x3.aarch32_neon (217 ms)
[ RUN      ] FAST_S4GEMM_3x3.aarch32_neon2
[       OK ] FAST_S4GEMM_3x3.aarch32_neon2 (218 ms)
[----------] 3 tests from FAST_S4GEMM_3x3 (660 ms total)

[----------] 1 test from FULL_S4GEMM_3x3
[ RUN      ] FULL_S4GEMM_3x3.neon
[       OK ] FULL_S4GEMM_3x3.neon (1132 ms)
[----------] 1 test from FULL_S4GEMM_3x3 (1132 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 2 test cases ran. (1792 ms total)
[  PASSED  ] 4 tests.
[5/12] RUN hxgemm-test
Running main() from gtest_main.cc
[==========] Running 6 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 4 tests from FAST_H4GEMM_3x3
[ RUN      ] FAST_H4GEMM_3x3.neonhp
[       OK ] FAST_H4GEMM_3x3.neonhp (435 ms)
[ RUN      ] FAST_H4GEMM_3x3.aarch32_neonhp
[       OK ] FAST_H4GEMM_3x3.aarch32_neonhp (397 ms)
[ RUN      ] FAST_H4GEMM_3x3.aarch32_neon2
[       OK ] FAST_H4GEMM_3x3.aarch32_neon2 (386 ms)
[ RUN      ] FAST_H4GEMM_3x3.aarch32_neonhparith
[       OK ] FAST_H4GEMM_3x3.aarch32_neonhparith (0 ms)
[----------] 4 tests from FAST_H4GEMM_3x3 (1219 ms total)

[----------] 2 tests from FULL_H4GEMM_3x3
[ RUN      ] FULL_H4GEMM_3x3.neon
[       OK ] FULL_H4GEMM_3x3.neon (1938 ms)
[ RUN      ] FULL_H4GEMM_3x3.aarch32_neon2
[       OK ] FULL_H4GEMM_3x3.aarch32_neon2 (1923 ms)
[----------] 2 tests from FULL_H4GEMM_3x3 (3861 ms total)

[----------] Global test environment tear-down
[==========] 6 tests from 2 test cases ran. (5081 ms total)
[  PASSED  ] 6 tests.
[6/12] RUN convolution-input-gradient-smoketest
[==========] Running 26 tests from 3 test cases.
[----------] Global test environment set-up.
[----------] 9 tests from FT8x8
[ RUN      ] FT8x8.single_tile
[       OK ] FT8x8.single_tile (9 ms)
[ RUN      ] FT8x8.input_subtile
[       OK ] FT8x8.input_subtile (2 ms)
[ RUN      ] FT8x8.multi_tile
[       OK ] FT8x8.multi_tile (5 ms)
[ RUN      ] FT8x8.implicit_padding
[       OK ] FT8x8.implicit_padding (164 ms)
[ RUN      ] FT8x8.small_batch
[       OK ] FT8x8.small_batch (16 ms)
[ RUN      ] FT8x8.few_input_channels
[       OK ] FT8x8.few_input_channels (16 ms)
[ RUN      ] FT8x8.few_output_channels
[       OK ] FT8x8.few_output_channels (15 ms)
[ RUN      ] FT8x8.non_square_kernel
[       OK ] FT8x8.non_square_kernel (3 ms)
[ RUN      ] FT8x8.non_square_image
[       OK ] FT8x8.non_square_image (4 ms)
[----------] 9 tests from FT8x8 (234 ms total)

[----------] 9 tests from FT16x16
[ RUN      ] FT16x16.single_tile
[       OK ] FT16x16.single_tile (10 ms)
[ RUN      ] FT16x16.input_subtile
[       OK ] FT16x16.input_subtile (4 ms)
[ RUN      ] FT16x16.multi_tile
[       OK ] FT16x16.multi_tile (36 ms)
[ RUN      ] FT16x16.implicit_padding
[       OK ] FT16x16.implicit_padding (876 ms)
[ RUN      ] FT16x16.small_batch
[       OK ] FT16x16.small_batch (118 ms)
[ RUN      ] FT16x16.few_input_channels
[       OK ] FT16x16.few_input_channels (106 ms)
[ RUN      ] FT16x16.few_output_channels
[       OK ] FT16x16.few_output_channels (98 ms)
[ RUN      ] FT16x16.non_square_kernel
[       OK ] FT16x16.non_square_kernel (20 ms)
[ RUN      ] FT16x16.non_square_image
[       OK ] FT16x16.non_square_image (24 ms)
[----------] 9 tests from FT16x16 (1295 ms total)

[----------] 8 tests from WT8x8
[ RUN      ] WT8x8.single_tile
[       OK ] WT8x8.single_tile (3 ms)
[ RUN      ] WT8x8.input_subtile
[       OK ] WT8x8.input_subtile (1 ms)
[ RUN      ] WT8x8.multi_tile
[       OK ] WT8x8.multi_tile (9 ms)
[ RUN      ] WT8x8.implicit_padding
[       OK ] WT8x8.implicit_padding (46 ms)
[ RUN      ] WT8x8.small_batch
[       OK ] WT8x8.small_batch (33 ms)
[ RUN      ] WT8x8.few_input_channels
[       OK ] WT8x8.few_input_channels (28 ms)
[ RUN      ] WT8x8.few_output_channels
[       OK ] WT8x8.few_output_channels (29 ms)
[ RUN      ] WT8x8.non_square_image
[       OK ] WT8x8.non_square_image (4 ms)
[----------] 8 tests from WT8x8 (153 ms total)

[----------] Global test environment tear-down
[==========] 26 tests from 3 test cases ran. (1682 ms total)
[  PASSED  ] 26 tests.
[7/12] RUN convolution-kernel-gradient-smoketest
[==========] Running 18 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 9 tests from FT8x8
[ RUN      ] FT8x8.single_tile
[       OK ] FT8x8.single_tile (10 ms)
[ RUN      ] FT8x8.input_subtile
[       OK ] FT8x8.input_subtile (1 ms)
[ RUN      ] FT8x8.multi_tile
[       OK ] FT8x8.multi_tile (5 ms)
[ RUN      ] FT8x8.implicit_padding
[       OK ] FT8x8.implicit_padding (177 ms)
[ RUN      ] FT8x8.small_batch
[       OK ] FT8x8.small_batch (21 ms)
[ RUN      ] FT8x8.few_input_channels
[       OK ] FT8x8.few_input_channels (19 ms)
[ RUN      ] FT8x8.few_output_channels
[       OK ] FT8x8.few_output_channels (17 ms)
[ RUN      ] FT8x8.non_square_kernel
[       OK ] FT8x8.non_square_kernel (2 ms)
[ RUN      ] FT8x8.non_square_image
[       OK ] FT8x8.non_square_image (3 ms)
[----------] 9 tests from FT8x8 (256 ms total)

[----------] 9 tests from FT16x16
[ RUN      ] FT16x16.single_tile
[       OK ] FT16x16.single_tile (10 ms)
[ RUN      ] FT16x16.input_subtile
[       OK ] FT16x16.input_subtile (4 ms)
[ RUN      ] FT16x16.multi_tile
[       OK ] FT16x16.multi_tile (33 ms)
[ RUN      ] FT16x16.implicit_padding
[       OK ] FT16x16.implicit_padding (767 ms)
[ RUN      ] FT16x16.small_batch
[       OK ] FT16x16.small_batch (118 ms)
[ RUN      ] FT16x16.few_input_channels
[       OK ] FT16x16.few_input_channels (105 ms)
[ RUN      ] FT16x16.few_output_channels
[       OK ] FT16x16.few_output_channels (101 ms)
[ RUN      ] FT16x16.non_square_kernel
[       OK ] FT16x16.non_square_kernel (9 ms)
[ RUN      ] FT16x16.non_square_image
[       OK ] FT16x16.non_square_image (21 ms)
[----------] 9 tests from FT16x16 (1169 ms total)

[----------] Global test environment tear-down
[==========] 18 tests from 2 test cases ran. (1425 ms total)
[  PASSED  ] 18 tests.

  YOU HAVE 8 DISABLED TESTS

[8/12] RUN convolution-inference-smoketest
[==========] Running 160 tests from 9 test cases.
[----------] Global test environment set-up.
[----------] 16 tests from FT8x8
[ RUN      ] FT8x8.single_tile
[       OK ] FT8x8.single_tile (19 ms)
[ RUN      ] FT8x8.single_tile_with_relu
[       OK ] FT8x8.single_tile_with_relu (2 ms)
[ RUN      ] FT8x8.input_subtile
[       OK ] FT8x8.input_subtile (1 ms)
[ RUN      ] FT8x8.input_subtile_with_relu
[       OK ] FT8x8.input_subtile_with_relu (1 ms)
[ RUN      ] FT8x8.multi_tile
[       OK ] FT8x8.multi_tile (6 ms)
[ RUN      ] FT8x8.multi_tile_with_relu
[       OK ] FT8x8.multi_tile_with_relu (6 ms)
[ RUN      ] FT8x8.implicit_padding
[       OK ] FT8x8.implicit_padding (171 ms)
[ RUN      ] FT8x8.implicit_padding_with_relu
[       OK ] FT8x8.implicit_padding_with_relu (174 ms)
[ RUN      ] FT8x8.few_input_channels
[       OK ] FT8x8.few_input_channels (20 ms)
[ RUN      ] FT8x8.few_input_channels_with_relu
[       OK ] FT8x8.few_input_channels_with_relu (21 ms)
[ RUN      ] FT8x8.few_output_channels
[       OK ] FT8x8.few_output_channels (20 ms)
[ RUN      ] FT8x8.few_output_channels_with_relu
[       OK ] FT8x8.few_output_channels_with_relu (20 ms)
[ RUN      ] FT8x8.non_square_kernel
[       OK ] FT8x8.non_square_kernel (2 ms)
[ RUN      ] FT8x8.non_square_kernel_with_relu
[       OK ] FT8x8.non_square_kernel_with_relu (2 ms)
[ RUN      ] FT8x8.non_square_image
[       OK ] FT8x8.non_square_image (4 ms)
[ RUN      ] FT8x8.non_square_image_with_relu
[       OK ] FT8x8.non_square_image_with_relu (4 ms)
[----------] 16 tests from FT8x8 (478 ms total)

[----------] 16 tests from FT16x16
[ RUN      ] FT16x16.single_tile
[       OK ] FT16x16.single_tile (12 ms)
[ RUN      ] FT16x16.single_tile_with_relu
[       OK ] FT16x16.single_tile_with_relu (12 ms)
[ RUN      ] FT16x16.input_subtile
[       OK ] FT16x16.input_subtile (5 ms)
[ RUN      ] FT16x16.input_subtile_with_relu
[       OK ] FT16x16.input_subtile_with_relu (4 ms)
[ RUN      ] FT16x16.multi_tile
[       OK ] FT16x16.multi_tile (36 ms)
[ RUN      ] FT16x16.multi_tile_with_relu
[       OK ] FT16x16.multi_tile_with_relu (37 ms)
[ RUN      ] FT16x16.implicit_padding
[       OK ] FT16x16.implicit_padding (781 ms)
[ RUN      ] FT16x16.implicit_padding_with_relu
[       OK ] FT16x16.implicit_padding_with_relu (791 ms)
[ RUN      ] FT16x16.few_input_channels
[       OK ] FT16x16.few_input_channels (114 ms)
[ RUN      ] FT16x16.few_input_channels_with_relu
[       OK ] FT16x16.few_input_channels_with_relu (115 ms)
[ RUN      ] FT16x16.few_output_channels
[       OK ] FT16x16.few_output_channels (117 ms)
[ RUN      ] FT16x16.few_output_channels_with_relu
[       OK ] FT16x16.few_output_channels_with_relu (121 ms)
[ RUN      ] FT16x16.non_square_kernel
[       OK ] FT16x16.non_square_kernel (11 ms)
[ RUN      ] FT16x16.non_square_kernel_with_relu
[       OK ] FT16x16.non_square_kernel_with_relu (11 ms)
[ RUN      ] FT16x16.non_square_image
[       OK ] FT16x16.non_square_image (21 ms)
[ RUN      ] FT16x16.non_square_image_with_relu
[       OK ] FT16x16.non_square_image_with_relu (21 ms)
[----------] 16 tests from FT16x16 (2218 ms total)

[----------] 28 tests from WT8x8
[ RUN      ] WT8x8.single_tile
[       OK ] WT8x8.single_tile (2 ms)
[ RUN      ] WT8x8.single_tile_with_relu
[       OK ] WT8x8.single_tile_with_relu (2 ms)
[ RUN      ] WT8x8.single_tile_with_subsample2x2
[       OK ] WT8x8.single_tile_with_subsample2x2 (2 ms)
[ RUN      ] WT8x8.single_tile_with_subsample2x2_relu
[       OK ] WT8x8.single_tile_with_subsample2x2_relu (1 ms)
[ RUN      ] WT8x8.input_subtile
[       OK ] WT8x8.input_subtile (1 ms)
[ RUN      ] WT8x8.input_subtile_with_relu
[       OK ] WT8x8.input_subtile_with_relu (1 ms)
[ RUN      ] WT8x8.input_subtile_with_subsample2x2
[       OK ] WT8x8.input_subtile_with_subsample2x2 (1 ms)
[ RUN      ] WT8x8.input_subtile_with_subsample2x2_relu
[       OK ] WT8x8.input_subtile_with_subsample2x2_relu (1 ms)
[ RUN      ] WT8x8.multi_tile
[       OK ] WT8x8.multi_tile (5 ms)
[ RUN      ] WT8x8.multi_tile_with_relu
[       OK ] WT8x8.multi_tile_with_relu (5 ms)
[ RUN      ] WT8x8.multi_tile_with_subsample2x2
[       OK ] WT8x8.multi_tile_with_subsample2x2 (4 ms)
[ RUN      ] WT8x8.multi_tile_with_subsample2x2_relu
[       OK ] WT8x8.multi_tile_with_subsample2x2_relu (3 ms)
[ RUN      ] WT8x8.implicit_padding
[       OK ] WT8x8.implicit_padding (37 ms)
[ RUN      ] WT8x8.implicit_padding_with_relu
[       OK ] WT8x8.implicit_padding_with_relu (38 ms)
[ RUN      ] WT8x8.implicit_padding_with_subsample2x2
[       OK ] WT8x8.implicit_padding_with_subsample2x2 (26 ms)
[ RUN      ] WT8x8.implicit_padding_with_subsample2x2_relu
[       OK ] WT8x8.implicit_padding_with_subsample2x2_relu (27 ms)
[ RUN      ] WT8x8.few_input_channels
[       OK ] WT8x8.few_input_channels (18 ms)
[ RUN      ] WT8x8.few_input_channels_with_relu
[       OK ] WT8x8.few_input_channels_with_relu (18 ms)
[ RUN      ] WT8x8.few_input_channels_with_subsample2x2
[       OK ] WT8x8.few_input_channels_with_subsample2x2 (12 ms)
[ RUN      ] WT8x8.few_input_channels_with_subsample2x2_relu
[       OK ] WT8x8.few_input_channels_with_subsample2x2_relu (12 ms)
[ RUN      ] WT8x8.few_output_channels
[       OK ] WT8x8.few_output_channels (18 ms)
[ RUN      ] WT8x8.few_output_channels_with_relu
[       OK ] WT8x8.few_output_channels_with_relu (18 ms)
[ RUN      ] WT8x8.few_output_channels_with_subsample2x2
[       OK ] WT8x8.few_output_channels_with_subsample2x2 (10 ms)
[ RUN      ] WT8x8.few_output_channels_with_subsample2x2_relu
[       OK ] WT8x8.few_output_channels_with_subsample2x2_relu (10 ms)
[ RUN      ] WT8x8.non_square_image
[       OK ] WT8x8.non_square_image (3 ms)
[ RUN      ] WT8x8.non_square_image_with_relu
[       OK ] WT8x8.non_square_image_with_relu (3 ms)
[ RUN      ] WT8x8.non_square_image_with_subsample2x2
[       OK ] WT8x8.non_square_image_with_subsample2x2 (2 ms)
[ RUN      ] WT8x8.non_square_image_with_subsample2x2_relu
[       OK ] WT8x8.non_square_image_with_subsample2x2_relu (2 ms)
[----------] 28 tests from WT8x8 (290 ms total)

[----------] 14 tests from WT8x8_FP16
[ RUN      ] WT8x8_FP16.single_tile
[       OK ] WT8x8_FP16.single_tile (2 ms)
[ RUN      ] WT8x8_FP16.single_tile_with_relu
[       OK ] WT8x8_FP16.single_tile_with_relu (2 ms)
[ RUN      ] WT8x8_FP16.input_subtile
[       OK ] WT8x8_FP16.input_subtile (1 ms)
[ RUN      ] WT8x8_FP16.input_subtile_with_relu
[       OK ] WT8x8_FP16.input_subtile_with_relu (1 ms)
[ RUN      ] WT8x8_FP16.multi_tile
[       OK ] WT8x8_FP16.multi_tile (5 ms)
[ RUN      ] WT8x8_FP16.multi_tile_with_relu
[       OK ] WT8x8_FP16.multi_tile_with_relu (5 ms)
[ RUN      ] WT8x8_FP16.implicit_padding
[       OK ] WT8x8_FP16.implicit_padding (35 ms)
[ RUN      ] WT8x8_FP16.implicit_padding_with_relu
[       OK ] WT8x8_FP16.implicit_padding_with_relu (36 ms)
[ RUN      ] WT8x8_FP16.few_input_channels
[       OK ] WT8x8_FP16.few_input_channels (18 ms)
[ RUN      ] WT8x8_FP16.few_input_channels_with_relu
[       OK ] WT8x8_FP16.few_input_channels_with_relu (18 ms)
[ RUN      ] WT8x8_FP16.few_output_channels
[       OK ] WT8x8_FP16.few_output_channels (17 ms)
[ RUN      ] WT8x8_FP16.few_output_channels_with_relu
[       OK ] WT8x8_FP16.few_output_channels_with_relu (17 ms)
[ RUN      ] WT8x8_FP16.non_square_image
[       OK ] WT8x8_FP16.non_square_image (3 ms)
[ RUN      ] WT8x8_FP16.non_square_image_with_relu
[       OK ] WT8x8_FP16.non_square_image_with_relu (3 ms)
[----------] 14 tests from WT8x8_FP16 (168 ms total)

[----------] 16 tests from FT8x8_PRECOMPUTE
[ RUN      ] FT8x8_PRECOMPUTE.single_tile
[       OK ] FT8x8_PRECOMPUTE.single_tile (2 ms)
[ RUN      ] FT8x8_PRECOMPUTE.single_tile_with_relu
[       OK ] FT8x8_PRECOMPUTE.single_tile_with_relu (3 ms)
[ RUN      ] FT8x8_PRECOMPUTE.input_subtile
[       OK ] FT8x8_PRECOMPUTE.input_subtile (1 ms)
[ RUN      ] FT8x8_PRECOMPUTE.input_subtile_with_relu
[       OK ] FT8x8_PRECOMPUTE.input_subtile_with_relu (1 ms)
[ RUN      ] FT8x8_PRECOMPUTE.multi_tile
[       OK ] FT8x8_PRECOMPUTE.multi_tile (6 ms)
[ RUN      ] FT8x8_PRECOMPUTE.multi_tile_with_relu
[       OK ] FT8x8_PRECOMPUTE.multi_tile_with_relu (6 ms)
[ RUN      ] FT8x8_PRECOMPUTE.implicit_padding
[       OK ] FT8x8_PRECOMPUTE.implicit_padding (178 ms)
[ RUN      ] FT8x8_PRECOMPUTE.implicit_padding_with_relu
[       OK ] FT8x8_PRECOMPUTE.implicit_padding_with_relu (182 ms)
[ RUN      ] FT8x8_PRECOMPUTE.few_input_channels
[       OK ] FT8x8_PRECOMPUTE.few_input_channels (21 ms)
[ RUN      ] FT8x8_PRECOMPUTE.few_input_channels_with_relu
[       OK ] FT8x8_PRECOMPUTE.few_input_channels_with_relu (22 ms)
[ RUN      ] FT8x8_PRECOMPUTE.few_output_channels
[       OK ] FT8x8_PRECOMPUTE.few_output_channels (21 ms)
[ RUN      ] FT8x8_PRECOMPUTE.few_output_channels_with_relu
[       OK ] FT8x8_PRECOMPUTE.few_output_channels_with_relu (22 ms)
[ RUN      ] FT8x8_PRECOMPUTE.non_square_kernel
[       OK ] FT8x8_PRECOMPUTE.non_square_kernel (2 ms)
[ RUN      ] FT8x8_PRECOMPUTE.non_square_kernel_with_relu
[       OK ] FT8x8_PRECOMPUTE.non_square_kernel_with_relu (2 ms)
[ RUN      ] FT8x8_PRECOMPUTE.non_square_image
[       OK ] FT8x8_PRECOMPUTE.non_square_image (4 ms)
[ RUN      ] FT8x8_PRECOMPUTE.non_square_image_with_relu
[       OK ] FT8x8_PRECOMPUTE.non_square_image_with_relu (3 ms)
[----------] 16 tests from FT8x8_PRECOMPUTE (483 ms total)

[----------] 16 tests from FT16x16_PRECOMPUTE
[ RUN      ] FT16x16_PRECOMPUTE.single_tile
[       OK ] FT16x16_PRECOMPUTE.single_tile (12 ms)
[ RUN      ] FT16x16_PRECOMPUTE.single_tile_with_relu
[       OK ] FT16x16_PRECOMPUTE.single_tile_with_relu (12 ms)
[ RUN      ] FT16x16_PRECOMPUTE.input_subtile
[       OK ] FT16x16_PRECOMPUTE.input_subtile (5 ms)
[ RUN      ] FT16x16_PRECOMPUTE.input_subtile_with_relu
[       OK ] FT16x16_PRECOMPUTE.input_subtile_with_relu (5 ms)
[ RUN      ] FT16x16_PRECOMPUTE.multi_tile
[       OK ] FT16x16_PRECOMPUTE.multi_tile (37 ms)
[ RUN      ] FT16x16_PRECOMPUTE.multi_tile_with_relu
[       OK ] FT16x16_PRECOMPUTE.multi_tile_with_relu (38 ms)
[ RUN      ] FT16x16_PRECOMPUTE.implicit_padding
[       OK ] FT16x16_PRECOMPUTE.implicit_padding (796 ms)
[ RUN      ] FT16x16_PRECOMPUTE.implicit_padding_with_relu
[       OK ] FT16x16_PRECOMPUTE.implicit_padding_with_relu (806 ms)
[ RUN      ] FT16x16_PRECOMPUTE.few_input_channels
[       OK ] FT16x16_PRECOMPUTE.few_input_channels (118 ms)
[ RUN      ] FT16x16_PRECOMPUTE.few_input_channels_with_relu
[       OK ] FT16x16_PRECOMPUTE.few_input_channels_with_relu (119 ms)
[ RUN      ] FT16x16_PRECOMPUTE.few_output_channels
[       OK ] FT16x16_PRECOMPUTE.few_output_channels (120 ms)
[ RUN      ] FT16x16_PRECOMPUTE.few_output_channels_with_relu
[       OK ] FT16x16_PRECOMPUTE.few_output_channels_with_relu (124 ms)
[ RUN      ] FT16x16_PRECOMPUTE.non_square_kernel
[       OK ] FT16x16_PRECOMPUTE.non_square_kernel (11 ms)
[ RUN      ] FT16x16_PRECOMPUTE.non_square_kernel_with_relu
[       OK ] FT16x16_PRECOMPUTE.non_square_kernel_with_relu (12 ms)
[ RUN      ] FT16x16_PRECOMPUTE.non_square_image
[       OK ] FT16x16_PRECOMPUTE.non_square_image (21 ms)
[ RUN      ] FT16x16_PRECOMPUTE.non_square_image_with_relu
[       OK ] FT16x16_PRECOMPUTE.non_square_image_with_relu (22 ms)
[----------] 16 tests from FT16x16_PRECOMPUTE (2259 ms total)

[----------] 28 tests from WT8x8_PRECOMPUTE
[ RUN      ] WT8x8_PRECOMPUTE.single_tile
[       OK ] WT8x8_PRECOMPUTE.single_tile (2 ms)
[ RUN      ] WT8x8_PRECOMPUTE.single_tile_with_relu
[       OK ] WT8x8_PRECOMPUTE.single_tile_with_relu (3 ms)
[ RUN      ] WT8x8_PRECOMPUTE.single_tile_with_subsample2x2
[       OK ] WT8x8_PRECOMPUTE.single_tile_with_subsample2x2 (1 ms)
[ RUN      ] WT8x8_PRECOMPUTE.single_tile_with_subsample2x2_relu
[       OK ] WT8x8_PRECOMPUTE.single_tile_with_subsample2x2_relu (2 ms)
[ RUN      ] WT8x8_PRECOMPUTE.input_subtile
[       OK ] WT8x8_PRECOMPUTE.input_subtile (1 ms)
[ RUN      ] WT8x8_PRECOMPUTE.input_subtile_with_relu
[       OK ] WT8x8_PRECOMPUTE.input_subtile_with_relu (1 ms)
[ RUN      ] WT8x8_PRECOMPUTE.input_subtile_with_subsample2x2
[       OK ] WT8x8_PRECOMPUTE.input_subtile_with_subsample2x2 (2 ms)
[ RUN      ] WT8x8_PRECOMPUTE.input_subtile_with_subsample2x2_relu
[       OK ] WT8x8_PRECOMPUTE.input_subtile_with_subsample2x2_relu (1 ms)
[ RUN      ] WT8x8_PRECOMPUTE.multi_tile
[       OK ] WT8x8_PRECOMPUTE.multi_tile (5 ms)
[ RUN      ] WT8x8_PRECOMPUTE.multi_tile_with_relu
[       OK ] WT8x8_PRECOMPUTE.multi_tile_with_relu (5 ms)
[ RUN      ] WT8x8_PRECOMPUTE.multi_tile_with_subsample2x2
[       OK ] WT8x8_PRECOMPUTE.multi_tile_with_subsample2x2 (4 ms)
[ RUN      ] WT8x8_PRECOMPUTE.multi_tile_with_subsample2x2_relu
[       OK ] WT8x8_PRECOMPUTE.multi_tile_with_subsample2x2_relu (3 ms)
[ RUN      ] WT8x8_PRECOMPUTE.implicit_padding
[       OK ] WT8x8_PRECOMPUTE.implicit_padding (40 ms)
[ RUN      ] WT8x8_PRECOMPUTE.implicit_padding_with_relu
[       OK ] WT8x8_PRECOMPUTE.implicit_padding_with_relu (41 ms)
[ RUN      ] WT8x8_PRECOMPUTE.implicit_padding_with_subsample2x2
[       OK ] WT8x8_PRECOMPUTE.implicit_padding_with_subsample2x2 (30 ms)
[ RUN      ] WT8x8_PRECOMPUTE.implicit_padding_with_subsample2x2_relu
[       OK ] WT8x8_PRECOMPUTE.implicit_padding_with_subsample2x2_relu (29 ms)
[ RUN      ] WT8x8_PRECOMPUTE.few_input_channels
[       OK ] WT8x8_PRECOMPUTE.few_input_channels (19 ms)
[ RUN      ] WT8x8_PRECOMPUTE.few_input_channels_with_relu
[       OK ] WT8x8_PRECOMPUTE.few_input_channels_with_relu (20 ms)
[ RUN      ] WT8x8_PRECOMPUTE.few_input_channels_with_subsample2x2
[       OK ] WT8x8_PRECOMPUTE.few_input_channels_with_subsample2x2 (14 ms)
[ RUN      ] WT8x8_PRECOMPUTE.few_input_channels_with_subsample2x2_relu
[       OK ] WT8x8_PRECOMPUTE.few_input_channels_with_subsample2x2_relu (13 ms)
[ RUN      ] WT8x8_PRECOMPUTE.few_output_channels
[       OK ] WT8x8_PRECOMPUTE.few_output_channels (19 ms)
[ RUN      ] WT8x8_PRECOMPUTE.few_output_channels_with_relu
[       OK ] WT8x8_PRECOMPUTE.few_output_channels_with_relu (19 ms)
[ RUN      ] WT8x8_PRECOMPUTE.few_output_channels_with_subsample2x2
[       OK ] WT8x8_PRECOMPUTE.few_output_channels_with_subsample2x2 (11 ms)
[ RUN      ] WT8x8_PRECOMPUTE.few_output_channels_with_subsample2x2_relu
[       OK ] WT8x8_PRECOMPUTE.few_output_channels_with_subsample2x2_relu (11 ms)
[ RUN      ] WT8x8_PRECOMPUTE.non_square_image
[       OK ] WT8x8_PRECOMPUTE.non_square_image (4 ms)
[ RUN      ] WT8x8_PRECOMPUTE.non_square_image_with_relu
[       OK ] WT8x8_PRECOMPUTE.non_square_image_with_relu (3 ms)
[ RUN      ] WT8x8_PRECOMPUTE.non_square_image_with_subsample2x2
[       OK ] WT8x8_PRECOMPUTE.non_square_image_with_subsample2x2 (3 ms)
[ RUN      ] WT8x8_PRECOMPUTE.non_square_image_with_subsample2x2_relu
[       OK ] WT8x8_PRECOMPUTE.non_square_image_with_subsample2x2_relu (2 ms)
[----------] 28 tests from WT8x8_PRECOMPUTE (309 ms total)

[----------] 14 tests from WT8x8_FP16_PRECOMPUTE
[ RUN      ] WT8x8_FP16_PRECOMPUTE.single_tile
[       OK ] WT8x8_FP16_PRECOMPUTE.single_tile (2 ms)
[ RUN      ] WT8x8_FP16_PRECOMPUTE.single_tile_with_relu
[       OK ] WT8x8_FP16_PRECOMPUTE.single_tile_with_relu (3 ms)
[ RUN      ] WT8x8_FP16_PRECOMPUTE.input_subtile
[       OK ] WT8x8_FP16_PRECOMPUTE.input_subtile (1 ms)
[ RUN      ] WT8x8_FP16_PRECOMPUTE.input_subtile_with_relu
[       OK ] WT8x8_FP16_PRECOMPUTE.input_subtile_with_relu (1 ms)
[ RUN      ] WT8x8_FP16_PRECOMPUTE.multi_tile
[       OK ] WT8x8_FP16_PRECOMPUTE.multi_tile (5 ms)
[ RUN      ] WT8x8_FP16_PRECOMPUTE.multi_tile_with_relu
[       OK ] WT8x8_FP16_PRECOMPUTE.multi_tile_with_relu (5 ms)
[ RUN      ] WT8x8_FP16_PRECOMPUTE.implicit_padding
[       OK ] WT8x8_FP16_PRECOMPUTE.implicit_padding (39 ms)
[ RUN      ] WT8x8_FP16_PRECOMPUTE.implicit_padding_with_relu
[       OK ] WT8x8_FP16_PRECOMPUTE.implicit_padding_with_relu (39 ms)
[ RUN      ] WT8x8_FP16_PRECOMPUTE.few_input_channels
[       OK ] WT8x8_FP16_PRECOMPUTE.few_input_channels (19 ms)
[ RUN      ] WT8x8_FP16_PRECOMPUTE.few_input_channels_with_relu
[       OK ] WT8x8_FP16_PRECOMPUTE.few_input_channels_with_relu (19 ms)
[ RUN      ] WT8x8_FP16_PRECOMPUTE.few_output_channels
[       OK ] WT8x8_FP16_PRECOMPUTE.few_output_channels (18 ms)
[ RUN      ] WT8x8_FP16_PRECOMPUTE.few_output_channels_with_relu
[       OK ] WT8x8_FP16_PRECOMPUTE.few_output_channels_with_relu (19 ms)
[ RUN      ] WT8x8_FP16_PRECOMPUTE.non_square_image
[       OK ] WT8x8_FP16_PRECOMPUTE.non_square_image (3 ms)
[ RUN      ] WT8x8_FP16_PRECOMPUTE.non_square_image_with_relu
[       OK ] WT8x8_FP16_PRECOMPUTE.non_square_image_with_relu (3 ms)
[----------] 14 tests from WT8x8_FP16_PRECOMPUTE (176 ms total)

[----------] 12 tests from DIRECT_1x1
[ RUN      ] DIRECT_1x1.channel_tile
[       OK ] DIRECT_1x1.channel_tile (7 ms)
[ RUN      ] DIRECT_1x1.channel_tile_with_relu
[       OK ] DIRECT_1x1.channel_tile_with_relu (7 ms)
[ RUN      ] DIRECT_1x1.channel_subtile
[       OK ] DIRECT_1x1.channel_subtile (48 ms)
[ RUN      ] DIRECT_1x1.channel_subtile_with_relu
[       OK ] DIRECT_1x1.channel_subtile_with_relu (50 ms)
[ RUN      ] DIRECT_1x1.input_multi_tile
[       OK ] DIRECT_1x1.input_multi_tile (16 ms)
[ RUN      ] DIRECT_1x1.input_multi_tile_with_relu
[       OK ] DIRECT_1x1.input_multi_tile_with_relu (17 ms)
[ RUN      ] DIRECT_1x1.output_multi_tile
[       OK ] DIRECT_1x1.output_multi_tile (28 ms)
[ RUN      ] DIRECT_1x1.output_multi_tile_with_relu
[       OK ] DIRECT_1x1.output_multi_tile_with_relu (30 ms)
[ RUN      ] DIRECT_1x1.input_output_multi_tile
[       OK ] DIRECT_1x1.input_output_multi_tile (65 ms)
[ RUN      ] DIRECT_1x1.input_output_multi_tile_with_relu
[       OK ] DIRECT_1x1.input_output_multi_tile_with_relu (65 ms)
[ RUN      ] DIRECT_1x1.odd_image_size
[       OK ] DIRECT_1x1.odd_image_size (2044 ms)
[ RUN      ] DIRECT_1x1.odd_image_size_with_relu
[       OK ] DIRECT_1x1.odd_image_size_with_relu (2122 ms)
[----------] 12 tests from DIRECT_1x1 (4500 ms total)

[----------] Global test environment tear-down
[==========] 160 tests from 9 test cases ran. (10881 ms total)
[  PASSED  ] 160 tests.
[9/12] RUN fully-connected-output-smoketest
[==========] Running 11 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 11 tests from MRxNR_4x24
[ RUN      ] MRxNR_4x24.single_input_channel
[       OK ] MRxNR_4x24.single_input_channel (13 ms)
[ RUN      ] MRxNR_4x24.few_input_channels
[       OK ] MRxNR_4x24.few_input_channels (7 ms)
[ RUN      ] MRxNR_4x24.many_input_channels
[       OK ] MRxNR_4x24.many_input_channels (280 ms)
[ RUN      ] MRxNR_4x24.batch_subblock
[       OK ] MRxNR_4x24.batch_subblock (11 ms)
[ RUN      ] MRxNR_4x24.small_batch_size
[       OK ] MRxNR_4x24.small_batch_size (10 ms)
[ RUN      ] MRxNR_4x24.batch_remainder_subblock
[       OK ] MRxNR_4x24.batch_remainder_subblock (45 ms)
[ RUN      ] MRxNR_4x24.large_batch_size
[       OK ] MRxNR_4x24.large_batch_size (193 ms)
[ RUN      ] MRxNR_4x24.output_channels_subblock
[       OK ] MRxNR_4x24.output_channels_subblock (67 ms)
[ RUN      ] MRxNR_4x24.few_output_channels
[       OK ] MRxNR_4x24.few_output_channels (10 ms)
[ RUN      ] MRxNR_4x24.output_channels_remainder_subblock
[       OK ] MRxNR_4x24.output_channels_remainder_subblock (265 ms)
[ RUN      ] MRxNR_4x24.many_output_channels
[       OK ] MRxNR_4x24.many_output_channels (228 ms)
[----------] 11 tests from MRxNR_4x24 (1131 ms total)

[----------] Global test environment tear-down
[==========] 11 tests from 1 test case ran. (1131 ms total)
[  PASSED  ] 11 tests.
[10/12] RUN max-pooling-output-smoketest
0
[==========] Running 32 tests from 4 test cases.
[----------] Global test environment set-up.
[----------] 8 tests from MAX_POOLING_2x2
[ RUN      ] MAX_POOLING_2x2.single_pool
[       OK ] MAX_POOLING_2x2.single_pool (1 ms)
[ RUN      ] MAX_POOLING_2x2.few_horizontal_pools
[       OK ] MAX_POOLING_2x2.few_horizontal_pools (11 ms)
[ RUN      ] MAX_POOLING_2x2.few_vertical_pools
[       OK ] MAX_POOLING_2x2.few_vertical_pools (13 ms)
[ RUN      ] MAX_POOLING_2x2.large_image
[       OK ] MAX_POOLING_2x2.large_image (129 ms)
[ RUN      ] MAX_POOLING_2x2.indivisible_size
[       OK ] MAX_POOLING_2x2.indivisible_size (0 ms)
[ RUN      ] MAX_POOLING_2x2.implicit_padding
[       OK ] MAX_POOLING_2x2.implicit_padding (69 ms)
[ RUN      ] MAX_POOLING_2x2.small_batch
[       OK ] MAX_POOLING_2x2.small_batch (17 ms)
[ RUN      ] MAX_POOLING_2x2.few_channels
[       OK ] MAX_POOLING_2x2.few_channels (16 ms)
[----------] 8 tests from MAX_POOLING_2x2 (256 ms total)

[----------] 8 tests from MAX_POOLING_3x3_STRIDE_2x2
[ RUN      ] MAX_POOLING_3x3_STRIDE_2x2.single_pool
[       OK ] MAX_POOLING_3x3_STRIDE_2x2.single_pool (0 ms)
[ RUN      ] MAX_POOLING_3x3_STRIDE_2x2.few_horizontal_pools
[       OK ] MAX_POOLING_3x3_STRIDE_2x2.few_horizontal_pools (13 ms)
[ RUN      ] MAX_POOLING_3x3_STRIDE_2x2.few_vertical_pools
[       OK ] MAX_POOLING_3x3_STRIDE_2x2.few_vertical_pools (15 ms)
[ RUN      ] MAX_POOLING_3x3_STRIDE_2x2.large_image
[       OK ] MAX_POOLING_3x3_STRIDE_2x2.large_image (163 ms)
[ RUN      ] MAX_POOLING_3x3_STRIDE_2x2.indivisible_size
[       OK ] MAX_POOLING_3x3_STRIDE_2x2.indivisible_size (1 ms)
[ RUN      ] MAX_POOLING_3x3_STRIDE_2x2.implicit_padding
[       OK ] MAX_POOLING_3x3_STRIDE_2x2.implicit_padding (372 ms)
[ RUN      ] MAX_POOLING_3x3_STRIDE_2x2.small_batch
[       OK ] MAX_POOLING_3x3_STRIDE_2x2.small_batch (21 ms)
[ RUN      ] MAX_POOLING_3x3_STRIDE_2x2.few_channels
[       OK ] MAX_POOLING_3x3_STRIDE_2x2.few_channels (21 ms)
[----------] 8 tests from MAX_POOLING_3x3_STRIDE_2x2 (607 ms total)

[----------] 8 tests from MAX_POOLING_1x2
[ RUN      ] MAX_POOLING_1x2.single_pool
[       OK ] MAX_POOLING_1x2.single_pool (0 ms)
[ RUN      ] MAX_POOLING_1x2.few_horizontal_pools
[       OK ] MAX_POOLING_1x2.few_horizontal_pools (7 ms)
[ RUN      ] MAX_POOLING_1x2.few_vertical_pools
[       OK ] MAX_POOLING_1x2.few_vertical_pools (31 ms)
[ RUN      ] MAX_POOLING_1x2.large_image
[       OK ] MAX_POOLING_1x2.large_image (147 ms)
[ RUN      ] MAX_POOLING_1x2.indivisible_size
[       OK ] MAX_POOLING_1x2.indivisible_size (0 ms)
[ RUN      ] MAX_POOLING_1x2.implicit_padding
[       OK ] MAX_POOLING_1x2.implicit_padding (21 ms)
[ RUN      ] MAX_POOLING_1x2.small_batch
[       OK ] MAX_POOLING_1x2.small_batch (19 ms)
[ RUN      ] MAX_POOLING_1x2.few_channels
[       OK ] MAX_POOLING_1x2.few_channels (19 ms)
[----------] 8 tests from MAX_POOLING_1x2 (245 ms total)

[----------] 8 tests from MAX_POOLING_2x1
[ RUN      ] MAX_POOLING_2x1.single_pool
[       OK ] MAX_POOLING_2x1.single_pool (0 ms)
[ RUN      ] MAX_POOLING_2x1.few_horizontal_pools
[       OK ] MAX_POOLING_2x1.few_horizontal_pools (30 ms)
[ RUN      ] MAX_POOLING_2x1.few_vertical_pools
[       OK ] MAX_POOLING_2x1.few_vertical_pools (8 ms)
[ RUN      ] MAX_POOLING_2x1.large_image
[       OK ] MAX_POOLING_2x1.large_image (154 ms)
[ RUN      ] MAX_POOLING_2x1.indivisible_size
[       OK ] MAX_POOLING_2x1.indivisible_size (0 ms)
[ RUN      ] MAX_POOLING_2x1.implicit_padding
[       OK ] MAX_POOLING_2x1.implicit_padding (22 ms)
[ RUN      ] MAX_POOLING_2x1.small_batch
[       OK ] MAX_POOLING_2x1.small_batch (19 ms)
[ RUN      ] MAX_POOLING_2x1.few_channels
[       OK ] MAX_POOLING_2x1.few_channels (20 ms)
[----------] 8 tests from MAX_POOLING_2x1 (253 ms total)

[----------] Global test environment tear-down
[==========] 32 tests from 4 test cases ran. (1361 ms total)
[  PASSED  ] 32 tests.
[11/12] RUN softmax-output-smoketest
[==========] Running 4 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 2 tests from OUT_OF_PLACE
[ RUN      ] OUT_OF_PLACE.few_channels
[       OK ] OUT_OF_PLACE.few_channels (50 ms)
[ RUN      ] OUT_OF_PLACE.small_batch
[       OK ] OUT_OF_PLACE.small_batch (94 ms)
[----------] 2 tests from OUT_OF_PLACE (144 ms total)

[----------] 2 tests from IN_PLACE
[ RUN      ] IN_PLACE.few_channels
[       OK ] IN_PLACE.few_channels (41 ms)
[ RUN      ] IN_PLACE.small_batch
[       OK ] IN_PLACE.small_batch (93 ms)
[----------] 2 tests from IN_PLACE (134 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 2 test cases ran. (278 ms total)
[  PASSED  ] 4 tests.

real    0m29.434s
user    0m27.841s
sys     0m0.600s


● NNPACK ninja testは不安定で途中で失敗する

 10時間の所でエラーが発生して中断しました。
 NNPACK ninja test unstable on Raspberry Pi 3B

 テスト結果はラズパイ3Bです。使用電源は 2.4A出力対応の中華製。
 USB電源ケーブルは Syncwire製の 20cm Syncwire UNBREAKcable 超高耐久 2.4A USBケーブル。

● NNPACK ninja testで「演算エラー」で FAILEDになる原因は?

 ラズパイ負荷対応の信頼の有る電源でもエラーが発生します。
[ FAILED ] FT8x8.conv1 (979227 ms)
[ FAILED ] FT16x16.conv2 (463511 ms)
[ FAILED ] WT8x8.conv1 (989511 ms)
[ FAILED ] FT16x16.conv2 (491525 ms)
[ FAILED ] FT16x16.conv2 (383976 ms)

 NVIDIA Jetson Nano 開発者キットでもエラーが発生します。
[ FAILED ] FT16x16.conv2 (55760 ms)
[ FAILED ] FT16x16.conv1 (14418 ms)
[ FAILED ] FT16x16.conv2 (55256 ms)

 NNPACKそのもののバグ(ARM CPUで実行した時の特性)と思われます。

Raspberry Pi 3B

pi@raspberrypi:~/NNPACK $ time ninja test
[0/40] RUN fourier-test
[==========] Running 16 tests from 16 test cases.
[----------] Global test environment set-up.
[----------] 1 test from FFT8_WITHIN_ROWS
[ RUN      ] FFT8_WITHIN_ROWS.match_reference
[       OK ] FFT8_WITHIN_ROWS.match_reference (5 ms)
[----------] 1 test from FFT8_WITHIN_ROWS (5 ms total)
...
[3/40] RUN convolution-output-vgg-a-test
[==========] Running 42 tests from 3 test cases.
[----------] Global test environment set-up.
[----------] 14 tests from FT8x8
[ RUN      ] FT8x8.conv1
/home/pi/NNPACK/test/testers/convolution.h:263: Failure
Expected: (median(maxErrors)) < (errorLimit()), actual: 1.1312e-05 vs 1e-05
[  FAILED  ] FT8x8.conv1 (979227 ms)
[ RUN      ] FT8x8.conv1_with_relu

[       OK ] FT8x8.conv1_with_relu (986404 ms)
[ RUN      ] FT8x8.conv2
[       OK ] FT8x8.conv2 (619313 ms)
[ RUN      ] FT8x8.conv2_with_relu
[       OK ] FT8x8.conv2_with_relu (612096 ms)
...
[==========] 42 tests from 3 test cases ran. (33997048 ms total)
[  PASSED  ] 41 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] FT8x8.conv1

 1 FAILED TEST
FAILED: convolution-output-vgg-a-test
/home/pi/NNPACK/bin/convolution-output-vgg-a-test --gtest_color=yes
ninja: build stopped: subcommand failed.

real    606m29.057s
user    1912m36.429s
sys     4m25.232s

● NNPACK ninja test
 2回目に実行した時はテスト成功。うーん、計算結果に不再現の症状が出るのは怖いなあ。
 テスト結果はラズパイ3Bです。使用電源は 2.4A出力対応の中華製。
 USB電源ケーブルは Syncwire製の 20cm Syncwire UNBREAKcable 超高耐久 2.4A USBケーブル。
Raspberry Pi 3B

[3/40] RUN convolution-output-vgg-a-test
[==========] Running 42 tests from 3 test cases.
[----------] Global test environment set-up.
[----------] 14 tests from FT8x8
[ RUN      ] FT8x8.conv1
[       OK ] FT8x8.conv1 (987678 ms)
[ RUN      ] FT8x8.conv1_with_relu
[       OK ] FT8x8.conv1_with_relu (1014503 ms)
...
[7/40] RUN convolution-input-gradient-overfeat-fast-test
[==========] Running 11 tests from 3 test cases.
[----------] Global test environment set-up.
[----------] 4 tests from FT8x8
[ RUN      ] FT8x8.conv2
[       OK ] FT8x8.conv2 (459160 ms)
[ RUN      ] FT8x8.conv3
[       OK ] FT8x8.conv3 (313131 ms)
[ RUN      ] FT8x8.conv4
[       OK ] FT8x8.conv4 (1828147 ms)
[ RUN      ] FT8x8.conv5
[       OK ] FT8x8.conv5 (3685636 ms)
[----------] 4 tests from FT8x8 (6286074 ms total)

[----------] 4 tests from FT16x16
[ RUN      ] FT16x16.conv2
/home/pi/NNPACK/test/testers/convolution.h:316: Failure
Expected: (median(maxErrors)) < (errorLimit()), actual: 1.10825e-05 vs 1e-05
[  FAILED  ] FT16x16.conv2 (463511 ms)
[ RUN      ] FT16x16.conv3
[       OK ] FT16x16.conv3 (296285 ms)
[ RUN      ] FT16x16.conv4
[       OK ] FT16x16.conv4 (1808263 ms)
[ RUN      ] FT16x16.conv5
[       OK ] FT16x16.conv5 (4614381 ms)
[----------] 4 tests from FT16x16 (7182507 ms total)

[----------] 3 tests from WT8x8
[ RUN      ] WT8x8.conv3
[       OK ] WT8x8.conv3 (266827 ms)
[ RUN      ] WT8x8.conv4
[       OK ] WT8x8.conv4 (1798542 ms)
[ RUN      ] WT8x8.conv5
[       OK ] WT8x8.conv5 (3625012 ms)
[----------] 3 tests from WT8x8 (5690381 ms total)

[----------] Global test environment tear-down
[==========] 11 tests from 3 test cases ran. (19158975 ms total)
[  PASSED  ] 10 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] FT16x16.conv2

 1 FAILED TEST
FAILED: convolution-input-gradient-overfeat-fast-test
/home/pi/NNPACK/bin/convolution-input-gradient-overfeat-fast-test --gtest_color=yes
ninja: build stopped: subcommand failed.

real    1661m11.602s
user    5906m41.008s
sys     7m22.361s

● NNPACK ninja test
 3回目に実行した時は別の場所でエラー。うーん、計算結果に不再現の症状が出るのは怖いなあ。
 テスト結果はラズパイ3Bです。使用電源は 2.4A出力対応の中華製。
 USB電源ケーブルは Syncwire製の 20cm Syncwire UNBREAKcable 超高耐久 2.4A USBケーブル。
Raspberry Pi 3B

[3/40] RUN convolution-output-vgg-a-test
[==========] Running 42 tests from 3 test cases.
[----------] Global test environment set-up.
[----------] 14 tests from FT8x8
[ RUN      ] FT8x8.conv1
[       OK ] FT8x8.conv1 (1003290 ms)
[ RUN      ] FT8x8.conv1_with_relu
[       OK ] FT8x8.conv1_with_relu (1009747 ms)
...
[ RUN      ] FT8x8.conv8_with_relu
[       OK ] FT8x8.conv8_with_relu (316830 ms)
[----------] 14 tests from FT8x8 (12550534 ms total)

[----------] 14 tests from FT16x16
[ RUN      ] FT16x16.conv1
[       OK ] FT16x16.conv1 (964562 ms)
...
[ RUN      ] FT16x16.conv8_with_relu
[       OK ] FT16x16.conv8_with_relu (318151 ms)
[----------] 14 tests from FT16x16 (13024473 ms total)

[----------] 14 tests from WT8x8
[ RUN      ] WT8x8.conv1
/home/pi/NNPACK/test/testers/convolution.h:263: Failure
Expected: (median(maxErrors)) < (errorLimit()), actual: 4.04286e-05 vs 3e-05
[  FAILED  ] WT8x8.conv1 (989511 ms)
[ RUN      ] WT8x8.conv1_with_relu
[       OK ] WT8x8.conv1_with_relu (1011220 ms)
[ RUN      ] WT8x8.conv2
[       OK ] WT8x8.conv2 (698127 ms)
[ RUN      ] WT8x8.conv2_with_relu
[       OK ] WT8x8.conv2_with_relu (657744 ms)
[ RUN      ] WT8x8.conv3
[       OK ] WT8x8.conv3 (404478 ms)
[ RUN      ] WT8x8.conv3_with_relu
[       OK ] WT8x8.conv3_with_relu (433328 ms)
[ RUN      ] WT8x8.conv4
[       OK ] WT8x8.conv4 (960574 ms)
[ RUN      ] WT8x8.conv4_with_relu
[       OK ] WT8x8.conv4_with_relu (946225 ms)
[ RUN      ] WT8x8.conv5
[       OK ] WT8x8.conv5 (548275 ms)
[ RUN      ] WT8x8.conv5_with_relu
[       OK ] WT8x8.conv5_with_relu (557061 ms)
[ RUN      ] WT8x8.conv6
[       OK ] WT8x8.conv6 (2075061 ms)
[ RUN      ] WT8x8.conv6_with_relu
[       OK ] WT8x8.conv6_with_relu (2105158 ms)
[ RUN      ] WT8x8.conv8
[       OK ] WT8x8.conv8 (293786 ms)
[ RUN      ] WT8x8.conv8_with_relu
[       OK ] WT8x8.conv8_with_relu (295364 ms)
[----------] 14 tests from WT8x8 (11976121 ms total)

[----------] Global test environment tear-down
[==========] 42 tests from 3 test cases ran. (37551129 ms total)
[  PASSED  ] 41 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] WT8x8.conv1

 1 FAILED TEST
FAILED: convolution-output-vgg-a-test
/home/pi/NNPACK/bin/convolution-output-vgg-a-test --gtest_color=yes
ninja: build stopped: subcommand failed.

real    665m7.769s
user    2142m56.325s
sys     4m29.563s

 やはり、途中で FAILになります。
Raspberry Pi 3B

[7/40] RUN convolution-input-gradient-overfeat-fast-test
[==========] Running 11 tests from 3 test cases.
[----------] Global test environment set-up.
[----------] 4 tests from FT8x8
[ RUN      ] FT8x8.conv2
[       OK ] FT8x8.conv2 (471809 ms)
[ RUN      ] FT8x8.conv3
[       OK ] FT8x8.conv3 (298618 ms)
[ RUN      ] FT8x8.conv4
[       OK ] FT8x8.conv4 (1834215 ms)
[ RUN      ] FT8x8.conv5
[       OK ] FT8x8.conv5 (3715144 ms)
[----------] 4 tests from FT8x8 (6319789 ms total)

[----------] 4 tests from FT16x16
[ RUN      ] FT16x16.conv2
/home/pi/NNPACK/test/testers/convolution.h:316: Failure
Expected: (median(maxErrors)) < (errorLimit()), actual: 1.05892e-05 vs 1e-05
[  FAILED  ] FT16x16.conv2 (491525 ms)
[ RUN      ] FT16x16.conv3
[       OK ] FT16x16.conv3 (315125 ms)
[ RUN      ] FT16x16.conv4
[       OK ] FT16x16.conv4 (1850051 ms)
[ RUN      ] FT16x16.conv5
[       OK ] FT16x16.conv5 (4727739 ms)
[----------] 4 tests from FT16x16 (7384502 ms total)

[----------] 3 tests from WT8x8
[ RUN      ] WT8x8.conv3
[       OK ] WT8x8.conv3 (279561 ms)
[ RUN      ] WT8x8.conv4
[       OK ] WT8x8.conv4 (1830631 ms)
[ RUN      ] WT8x8.conv5
[       OK ] WT8x8.conv5 (3778374 ms)
[----------] 3 tests from WT8x8 (5888567 ms total)

[----------] Global test environment tear-down
[==========] 11 tests from 3 test cases ran. (19592859 ms total)
[  PASSED  ] 10 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] FT16x16.conv2

 1 FAILED TEST
FAILED: convolution-input-gradient-overfeat-fast-test
/home/pi/NNPACK/bin/convolution-input-gradient-overfeat-fast-test --gtest_color=yes
ninja: build stopped: subcommand failed.

real    1732m6.611s
user    6181m25.241s
sys     7m22.115s


● Raspberry Pi用電源 Pi3 フル負荷検証済の電源で 3B+で ninja test中

 ラズパイ3B+で ninja test中

 Raspberry Pi用電源 Pi3 フル負荷検証済の電源でラズパイ3B+で ninja testを実行していますが、24時間以上経過していますがノーエラーでテストを実行しています。
 さすが、フル負荷検証済の電源セット!


Raspberry Pi用電源セット(5V 3.0A)-Pi3フル負荷検証済
ASIN: B01N8ZIJL8

 ラズパイ3B+と Pi3 フル負荷検証済の電源でテストが落ちました。
 この組み合わせで落ちると言う事は、NNPACKライブラリそのものの不安定なバグ?でしょうか?
Raspberry Pi 3B+
NNPACK ninja test unstable on Raspberry Pi 3B+

pi@raspberrypi:~/NNPACK $ time ninja test
[0/40] RUN fourier-test
[==========] Running 16 tests from 16 test cases.
[----------] Global test environment set-up.
[----------] 1 test from FFT8_WITHIN_ROWS
[ RUN      ] FFT8_WITHIN_ROWS.match_reference
[       OK ] FFT8_WITHIN_ROWS.match_reference (4 ms)
[----------] 1 test from FFT8_WITHIN_ROWS (5 ms total)
...
[7/40] RUN convolution-input-gradient-overfeat-fast-test
[==========] Running 11 tests from 3 test cases.
[----------] Global test environment set-up.
[----------] 4 tests from FT8x8
[ RUN      ] FT8x8.conv2
[       OK ] FT8x8.conv2 (381376 ms)
[ RUN      ] FT8x8.conv3
[       OK ] FT8x8.conv3 (278667 ms)
[ RUN      ] FT8x8.conv4
[       OK ] FT8x8.conv4 (1809048 ms)
[ RUN      ] FT8x8.conv5
[       OK ] FT8x8.conv5 (3777206 ms)
[----------] 4 tests from FT8x8 (6246297 ms total)

[----------] 4 tests from FT16x16
[ RUN      ] FT16x16.conv2
/home/pi/NNPACK/test/testers/convolution.h:316: Failure
Expected: (median(maxErrors)) < (errorLimit()), actual: 1.14361e-05 vs 1e-05
[  FAILED  ] FT16x16.conv2 (383976 ms)
[ RUN      ] FT16x16.conv3
[       OK ] FT16x16.conv3 (298799 ms)
[ RUN      ] FT16x16.conv4
[       OK ] FT16x16.conv4 (1804321 ms)
[ RUN      ] FT16x16.conv5
[       OK ] FT16x16.conv5 (8519764 ms)
[----------] 4 tests from FT16x16 (11006897 ms total)

[----------] 3 tests from WT8x8
[ RUN      ] WT8x8.conv3
[       OK ] WT8x8.conv3 (270080 ms)
[ RUN      ] WT8x8.conv4
[       OK ] WT8x8.conv4 (1869548 ms)
[ RUN      ] WT8x8.conv5
[       OK ] WT8x8.conv5 (3779774 ms)
[----------] 3 tests from WT8x8 (5919402 ms total)

[----------] Global test environment tear-down
[==========] 11 tests from 3 test cases ran. (23172603 ms total)
[  PASSED  ] 10 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] FT16x16.conv2

 1 FAILED TEST
FAILED: convolution-input-gradient-overfeat-fast-test
/home/pi/NNPACK/bin/convolution-input-gradient-overfeat-fast-test --gtest_color=yes
ninja: build stopped: subcommand failed.

real    2181m13.065s
user    5322m40.987s
sys     6m11.050s


● Raspberry Piで NNPACKをビルドする方法

 NNPACKを「Development builds」では無く、「recommended way to build」でビルドしてみる。

# お決まりの sudo apt-get updateで最新状態に更新する
sudo apt-get update

# Building For most users, the recommended way to build NNPACK is through CMake

# Install ninja build system
sudo apt-get -y install ninja-build

# sudo: pip: command not found
sudo apt-get -y install python-pip

sudo pip install --upgrade git+https://github.com/Maratyszcza/PeachPy
sudo pip install --upgrade git+https://github.com/Maratyszcza/confu

cd
git clone https://github.com/Maratyszcza/NNPACK.git
cd NNPACK
confu setup

mkdir build
cd build
cmake -G Ninja ..
ninja

time ninja test
pi@raspberrypi:~/NNPACK/build $ time ninja test
[0/1] Running tests...
Test project /home/pi/NNPACK/build
      Start  1: convolution-inference-smoketest
 1/34 Test  #1: convolution-inference-smoketest .........   Passed   69.54 sec
      Start  2: convolution-inference-alexnet
 2/34 Test  #2: convolution-inference-alexnet ...........   Passed  462.03 sec
      Start  3: convolution-inference-overfeat
 3/34 Test  #3: convolution-inference-overfeat ..........***Failed  2130.37 sec
      Start  4: convolution-inference-vgg
 4/34 Test  #4: convolution-inference-vgg ...............   Passed  4488.53 sec
      Start  5: convolution-output-smoketest
 5/34 Test  #5: convolution-output-smoketest ............   Passed   15.86 sec
      Start  6: convolution-output-alexnet
^Cninja: build stopped: interrupted by user.

real    169m8.554s
user    560m28.521s
sys     1m8.381s

pi@raspberrypi:~/NNPACK/build $ time ninja test
[0/1] Running tests...
Test project /home/pi/NNPACK/build
      Start  1: convolution-inference-smoketest
 1/34 Test  #1: convolution-inference-smoketest .........   Passed   67.11 sec
      Start  2: convolution-inference-alexnet
 2/34 Test  #2: convolution-inference-alexnet ...........   Passed  457.87 sec
      Start  3: convolution-inference-overfeat
 3/34 Test  #3: convolution-inference-overfeat ..........***Failed  1889.39 sec


● NVIDIA Jetson Nanoと Raspberry Pi 3B+との NNPACKのベンチマーク比較

TestNVIDIA Jetson NanoRaspberry Pi 3B+
Test  #1: convolution-inference-smoketest28.1869.54
Test  #2: convolution-inference-alexnet191.48462.03
Test  #3: convolution-inference-overfeat979.89Failed
Test  #4: convolution-inference-vgg1635.334488.53
Test  #5: convolution-output-smoketest6.5215.86
Test  #6: convolution-output-alexnet3275.18
Test  #7: convolution-output-overfeat15770.11



Tags: [Raspberry Pi], [電子工作], [ディープラーニング]

●関連するコンテンツ(この記事を読んだ人は、次の記事も読んでいます)

NVIDIA Jetson Nano 開発者キットを買ってみた。メモリ容量 4GB LPDDR4 RAM
NVIDIA Jetson Nano 開発者キットを買ってみた。メモリ容量 4GB LPDDR4 RAM

  Jetson Nanoで TensorFlow PyTorch Caffe/Caffe2 Keras MXNet等を GPUパワーで超高速で動かす!

Raspberry Piでメモリを馬鹿食いするアプリ用に不要なサービスを停止してフリーメモリを増やす方法
Raspberry Piでメモリを馬鹿食いするアプリ用に不要なサービスを停止してフリーメモリを増やす方法

  ラズパイでメモリを沢山使用するビルドやアプリ用に不要なサービス等を停止して使えるメインメモリを増やす

【成功版】最新版の Darknetに digitalbrain79版の Darknet with NNPACKの NNPACK処理を適用する
【成功版】最新版の Darknetに digitalbrain79版の Darknet with NNPACKの NNPACK処理を適用する

  ラズパイで NNPACK対応の最新版の Darknetを動かして超高速で物体検出や DeepDreamの悪夢を見る

【成功版】Raspberry Piで NNPACK対応版の Darknet Neural Network Frameworkをビルドする方法
【成功版】Raspberry Piで NNPACK対応版の Darknet Neural Network Frameworkをビルドする方法

  ラズパイに Darknet NNPACK darknet-nnpackをソースからビルドして物体検出を行なう方法

【成功版】Raspberry Piで Darknet Neural Network Frameworkをビルドする方法
【成功版】Raspberry Piで Darknet Neural Network Frameworkをビルドする方法

  ラズパイに Darknet Neural Network Frameworkを入れて物体検出や悪夢のグロ画像を生成する

【成功版】Raspberry Piに TensorFlow Deep Learning Frameworkをインストールする方法
【成功版】Raspberry Piに TensorFlow Deep Learning Frameworkをインストールする方法

  ラズパイに TensorFlow Deep Learning Frameworkを入れて Google DeepDreamで悪夢を見る方法

Raspberry Piで TensorFlow Deep Learning Frameworkを自己ビルドする方法
Raspberry Piで TensorFlow Deep Learning Frameworkを自己ビルドする方法

  ラズパイで TensorFlow Deep Learning Frameworkを自己ビルドする方法

Raspberry Piで Caffe Deep Learning Frameworkで物体認識を行なってみるテスト
Raspberry Piで Caffe Deep Learning Frameworkで物体認識を行なってみるテスト

  ラズパイで Caffe Deep Learning Frameworkを動かして物体認識を行なってみる

【ビルド版】Raspberry Piで DeepDreamを動かしてキモイ絵をモリモリ量産 Caffe Deep Learning Framework
【ビルド版】Raspberry Piで DeepDreamを動かしてキモイ絵をモリモリ量産 Caffe Deep Learning Framework

  ラズパイで Caffe Deep Learning Frameworkをビルドして Deep Dreamを動かしてキモイ絵を生成する

【インストール版】Raspberry Piで DeepDreamを動かしてキモイ絵をモリモリ量産 Caffe Deep Learning
【インストール版】Raspberry Piで DeepDreamを動かしてキモイ絵をモリモリ量産 Caffe Deep Learning

  ラズパイで Caffe Deep Learning Frameworkをインストールして Deep Dreamを動かしてキモイ絵を生成する

Raspberry Piで Caffe2 Deep Learning Frameworkをソースコードからビルドする方法
Raspberry Piで Caffe2 Deep Learning Frameworkをソースコードからビルドする方法

  ラズパイで Caffe 2 Deep Learning Frameworkをソースコードから自己ビルドする方法

Orange Pi PC 2の 64bitのチカラで DeepDreamしてキモイ絵を高速でモリモリ量産してみるテスト
Orange Pi PC 2の 64bitのチカラで DeepDreamしてキモイ絵を高速でモリモリ量産してみるテスト

  OrangePi PC2に Caffe Deep Learning Frameworkをビルドして Deep Dreamを動かしてキモイ絵を生成する

Raspberry Piに Jupyter Notebookをインストールして拡張子 ipynb形式の IPythonを動かす
Raspberry Piに Jupyter Notebookをインストールして拡張子 ipynb形式の IPythonを動かす

  ラズパイに IPython Notebookをインストールして Google DeepDream dream.ipynbを動かす

Raspberry Piで Deep Learningフレームワーク Chainerをインストールしてみる
Raspberry Piで Deep Learningフレームワーク Chainerをインストールしてみる

  ラズパイに Deep Learningのフレームワーク Chainerを入れてみた

Raspberry Piで DeepBeliefSDKをビルドして画像認識フレームワークを動かす方法
Raspberry Piで DeepBeliefSDKをビルドして画像認識フレームワークを動かす方法

  ラズパイに DeepBeliefSDKを入れて画像の物体認識を行なう

Raspberry Piで Microsoftの ELLをビルドする方法
Raspberry Piで Microsoftの ELLをビルドする方法

  ラズパイで Microsoftの ELL Embedded Learning Libraryをビルドしてみるテスト、ビルドするだけ

Raspberry Piで MXNet port of SSD Single Shot MultiBoxを動かして画像の物体検出をする方法
Raspberry Piで MXNet port of SSD Single Shot MultiBoxを動かして画像の物体検出をする方法

  ラズパイで MXNet port of SSD Single Shot MultiBox Object Detectorで物体検出を行なってみる

Raspberry Piで Apache MXNet Incubatingをビルドする方法
Raspberry Piで Apache MXNet Incubatingをビルドする方法

  ラズパイで Apache MXNet Incubatingをビルドしてみるテスト、ビルドするだけ

Raspberry Piで OpenCVの Haar Cascade Object Detectionでリアルタイムにカメラ映像の顔検出を行なってみる
Raspberry Piで OpenCVの Haar Cascade Object Detectionでリアルタイムにカメラ映像の顔検出を行なってみる

  ラズパイで OpenCVの Haar Cascade Object Detection Face & Eyeでリアルタイムでカメラ映像の顔検出をする方法

Raspberry Pi 3の Linuxコンソール上で使用する各種コマンドまとめ
Raspberry Pi 3の Linuxコンソール上で使用する各種コマンドまとめ

  ラズパイの Raspbian OSのコマンドラインで使用する便利コマンド、負荷試験や CPUシリアル番号の確認方法等も




[HOME] | [BACK]
リンクフリー(連絡不要、ただしトップページ以外は Web構成の変更で移動する場合があります)
Copyright (c) 2018 FREE WING,Y.Sakamoto
Powered by 猫屋敷工房 & HTML Generator

http://www.neko.ne.jp/~freewing/raspberry_pi/raspberry_pi_build_nnpack/