Radeon + Linux - testy

Zaczęty przez Rysiu, 02 Sierpień 2012, 14:11


Posiada może ktoś jakiegoś Radeona z systemem operacyjnym Linux i zainstalowanym środowiskiem OpenCL?

Pracuję nad pewną app i chciałbym przetestować ją na kartach AMD.


Pusto? Wszyscy się ukryli?  :deadman:


na mojm flagowcu z radkiem pracuje win


Nikt nie ma AMD z Linuxem...

A kto ma nVidię z Linuxem?


Cytat: Rysiu w 06 Sierpień 2012, 21:33
Nikt nie ma AMD z Linuxem...

A kto ma nVidię z Linuxem?

Ja mam nv z windą, ale na jednej wirtualce mam linuxa - tyle, że na wirtualce na cudach nie podziałamy.

Martin Fox

Ja mam amd, ale apu - nie wiem czy to cie zadowala :P


Może być APU. Z tego co widzę normalnie to wspiera OpenCL.


patrz PW -> konfiguracja słaba ale może do testów się nada  ;)
Athlon64 X2 + Radeon HD 7770

Konfiguracja bardzo dobra. Wystarczająca w zupełności.

Testy wykonuje do swojej pracy dyplomowej. Program symuluje algorytmy kwantowe. Potrzebuje pomocy przy testowaniu szybkości wykonywania algorytmu Shora.

Jest napisany także prosty benchmark. Testuje on zarówno CPU jak i inne sprzęty (np. GPU).

Link do repozytorium jest tutaj:


Możemy wpisać do konsoli:

svn checkout https://olib.googlecode.com/svn/trunk/

W katalogu gdzie będziemy zostanie utworzony folder 'trunk' i zostanie do niego zassane repozyturium.

Programy znajdują się w katalogu 'samples'. TAm też jest plik makefile. W pliku makefile jest dość dużo wersji programów dlatego nie polecam wpisywać 'make' lub 'make all' ponieważ zapewne czegoś w naszym systemie zabraknie. Lista dostępnych progrosów jest w samples/programs.list

Benchmark w wersji na jeden rdzeń można skompilować następująco:

make benchmark_sp

Benchmark w wersji dla OpenCL (karty AMD itp.) można skompilować za pomocą:

make benchmark_ocl

Po uruchomieniu wersji OpenCL powinna pokazać się lista dostępnych urządzeń, które można przetestować.

Sam algorytm Shora na OpenCL (wersja na float) kompilujemy za pomocą:


Możemy też skompilować wersję na double:


Czasy wykonywania algorytmu Shora sa różne. Na GPU mogą być bardzo niskie.

Najlepiej całość pobrać z repozytorium (przez polecenie) ponieważ tam znajduje się najnowsza rewizja (aktualnie 31). Wieczorem wyślę spakowane archiwum.

Na poczatek prosze o sprawdzenie czy uda się skompilować (wersję na OpenCL i ewentualnie sp).

Mogą pojawić się problemy z przekroczeniem rozmiaru stosu. Dla pewności proponuję wpisać przed uruchomieniem:

ulimit -s unlimited

Zmienia to rozmiar stosu na unlimited. Nie wpływa negatywnie na system. Zmiana zachodzi tylko dla jednej sesji i gdy odpalimy kolejny terminal to znowu trzeba podać ulimit -s unlimited. Nie wiem czy będzie to niezbędne ale pisze dla pewności.

Praca testera powinna ograniczyć się do sprawdzenia czy progros się skompiluje. Uruchomienia skryptu i odesłania wyników. Niczego dodatkowego nie będzie tester musiał robić więc nakład pracy raczej postaram się zredukować.


- 12.4 wg CCC
- AMD-APP-SDK-v2.7

wymagana była instalacja:
- systemu zarządzania wersjami "svn" -> pakiet subversion
- kompilatora g++ -> pakiet gcc-c++

kompilacja bez problemu

bash-4.2$ ./benchmark_sp
OLib Benchmark.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: Sequential
Compile date: Aug  8 2012
Compile time: 14:36:57
Architecture: x86_64

[1: FPU-32] Initial test
        Calculations... Please wait...
        Execution time: 68 seconds
        Rate: 1.470588

[1: FPU-32] Final test
        Calculations... Please wait...
        FP Units Testing...
        Execution time: 100 seconds
        Special Units Testing...
        Execution time: 57 seconds
        Performance: [32-bit float types]
                0.262633 GFLOPS
                1236842 special operations (e.g. sin, cos) per second

[1: FPU-64] Final test
        Calculations... Please wait...
        FP Units Testing...
        Execution time: 116 seconds
        Special Units Testing...
        Execution time: 51 seconds
        Performance: [64-bit float types]
                0.021677 GFLOPS
                132352 special operations (e.g. sin, cos) per second

[2: ALU-64] Final test
        Calculations... Please wait...
        Execution time: 30 seconds
        Performance: [64-bit integer types]
                6824.400009 MIPS


problemy przy kompilacji, nie znajduje CL/cl.h
pewnie powinno być globalne przypicsanie CL/ do /opt/AMDAPP/include
ale po drobnej zmianie w "makefile" (najczęsciej stosowana metoda) dodanie przy kompilacji -I/opt/AMDAPP/include kompilacja przeszła, choć jest trochę warning'ów

bash-4.2$ make benchmark_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include benchmark.cpp -o benchmark_core_ocl.o -D USE_OPENCL
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../primitives/olib_primitives_ocl.cpp
../primitives/olib_primitives_ocl.cpp: In function 'int support_fp64(int)':
../primitives/olib_primitives_ocl.cpp:79:84: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../primitives/olib_primitives_ocl.cpp: In function 'int search_devices(_cl_device_id**, cl_uint*)':
../primitives/olib_primitives_ocl.cpp:102:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
../primitives/olib_primitives_ocl.cpp:131:19: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
../primitives/olib_primitives_ocl.cpp: In function 'char* olib_read_opencl_kernel(char*)':
../primitives/olib_primitives_ocl.cpp:235:21: warning: variable 'readlen' set but not used [-Wunused-but-set-variable]

../primitives/olib_primitives_ocl.cpp:136: note: not vectorized: control flow in loop.
../primitives/olib_primitives_ocl.cpp:105: note: not vectorized: control flow in loop.
../primitives/olib_primitives_ocl.cpp:94: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../sp/olib_strings_sp.cpp

../sp/olib_strings_sp.cpp:71: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:71: note: not vectorized: Bad inner loop.
../sp/olib_strings_sp.cpp:74: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:48: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:48: note: not vectorized: Bad inner loop.
../sp/olib_strings_sp.cpp:52: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:35: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../opencl/olib_benchmark_ocl.cpp -lOpenCL
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_fp_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:41:110: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_fp_units_ocl(double*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:173:111: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_special_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:306:115: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_special_units_ocl(double*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:439:116: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_mips_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:575:106: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
g++ -Wall benchmark_core_ocl.o olib_primitives_ocl.o olib_strings_sp.o olib_benchmark_ocl.o -o benchmark_ocl -lOpenCL

z testem trochę gorzej (opisowo)
test FPU32 bit:
- po kilku sekundach zamrożenie systemu, brak reakcji na klawiaturę, mysz, brak odświeżania ekranu
- po kilkudziesięciu sekundach zakończenie testu przebudzenie, wyświetlenie wyników
- na kilka sekund wszystko OK, do uruchomienia kolejnego testu
test FPU64 bit:
- natychmiastowe zamrożenie systemu i .... :(
- po poborze mocy wniosek, że test przebiega
- po ok 7 minutach spada pobór mocy do poziomu pracy  GPU z wysokimi zegarami, ale nieobciążonej karty
- system nie reaguje, dysk pracuje, procesy w tle działają, X serwer nie wstaje
- restart komputera

może powód moich zmian bez zagłębiania się kod lub braku cierpliwości na dokończenie testu, która skończyła się po ~20min ;)

test przeprowadzony z wersją 31
obecnie widzę nową wersję 32, ale do czasu Twojej opinii/ sugestii wstrzymuję się od kolejnego testu

EDIT: nie wykonałem kolejnych czynności (kompilacji algorytmu shor'a), ani zmiany stosu może w tym tkwił problem
wykonywałem czynności w sugerowanej wyżej kolejności  :P

Wersja 32 niczego nie wnosi do problemów, o których wspomniałeś.

Ciekawi mnie jak benchmark wykrył urządzenie OpenCL? Przed uruchomieniem testu miałeś tam wyświetlone informacje o device. Możesz je podesłać bez uruchamiania samego testu (tylko uruchom benchmark, skopiuj informacje i nie zatwierdzaj aby liczył dalej).


owszem jest przy uruchomieniu  ./benchmark_ocl
ale nie mogłem tego zacytować, gdyż już nie miałem możliwości przechwycić konsoli

bash-4.2$ ./benchmark_ocl
OLib Benchmark.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: OpenCL
Compile date: Aug  8 2012
Compile time: 17:54:02
Architecture: x86_64

--- 1 OpenCL platform(s) found: -----------------------------------------
  -- 0 --
  VERSION = OpenCL 1.2 AMD-APP (938.1)
  NAME = AMD Accelerated Parallel Processing
  VENDOR = Advanced Micro Devices, Inc.
  EXTENSIONS = cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
--- 1 OpenCL device(s) found on platform: -----------------------------------------

  ------------------- 0 -------------------
  DEVICE_NAME = Capeverde
  DEVICE_VENDOR = Advanced Micro Devices, Inc.
  DEVICE_VERSION = OpenCL 1.2 AMD-APP (938.1)
  DRIVER_VERSION = CAL 1.4.1741 (VM)
  CL_DEVICE_EXTENSIONS = cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt

Enter the device number:

EDIT: wybierałem "0"

PG:         YOYO:



Problemem może być interfejs graficzny.

Dasz radę wyłączyć interfejs graficzny i odpalić program mając w systemie tylko tryb tekstowy?

Ja do wieczora postaram się poprawić benchamark i napiszę na forum jak będzie nowa rewizja.

PS. Faktycznie przetestowałem u siebie na nVidia Tesla C2050 i dostałem:

[1: FPU-32] Initial test
Calculations... Please wait...
Execution time: 19 seconds
Rate: 5.263158

[1: FPU-32] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 84 seconds
Special Units Testing...
Execution time: 127 seconds
Performance: [32-bit float types]
573.638161 GFLOPS
1018482283 special operations (e.g. sin, cos) per second

[1: FPU-64] Final test
Calculations... Please wait...
FP Units Testing...
ERROR: CL_INVALID_BINARY Execution time: 438 seconds
Special Units Testing...
ERROR: CL_INVALID_BINARY Execution time: 25 seconds
Performance: [64-bit float types]
11.001088 GFLOPS
517380000 special operations (e.g. sin, cos) per second

[2: ALU-64] Final test
Calculations... Please wait...
Execution time: 96 seconds
Performance: [64-bit integer types]
260850.292889 MIPS

Na GeForce GTS250 gdzie nie ma double precision działa dobrze:

[1: FPU-32] Initial test
Calculations... Please wait...
Execution time: 105 seconds
Rate: 0.952381

[1: FPU-32] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 106 seconds
Special Units Testing...
Execution time: 51 seconds
Performance: [32-bit float types]
82.255991 GFLOPS
458926470 special operations (e.g. sin, cos) per second

[1: FPU-64] Final test

[2: ALU-64] Final test
Calculations... Please wait...
Execution time: 30 seconds
Performance: [64-bit integer types]
151041.883120 MIPS

Jest jakiś ostry wałek w double. Zaraz będę szukał. Może pamiętasz jaką wydajność w FPU-32 pokazał Ci test?

Możesz też spróbowac do tego czasu skompilować i odpalić shor_algorithm_ocl_float (działa na 32-bit float więc jest szansa, że będzie ok)

On powinien wykonywać się szybko. Zwraca na ekran kilka liczb. Jeżeli się uruchomi i zadziała to wykonaj jego kilka razy (ma różne czasy wykonywania i zwraca różne wyniki).


wyłączenie interfejsu graficznego powoduje brak dostępu do urządzenia  :fright:
Cytat--- 1 OpenCL platform(s) found: -----------------------------------------
  -- 0 --
  VERSION = OpenCL 1.2 AMD-APP (938.1)
  NAME = AMD Accelerated Parallel Processing
  VENDOR = Advanced Micro Devices, Inc.
  EXTENSIONS = cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
OpenCL Error 'clGetDeviceIDs (platforms[0],CL_DEVICE_TYPE_GPU,100,devices,devices_n)' returned-1!
GPU znajduje tylko po podaniu polecenia "startx"  czyli pod interfejsem graficznym na 8 terminalu na pozostałych 1-6 w/w stan
natomiast przy wyłączonych X brak dostępu do 7 i 8 terminalu
(? przypadłość sterowników lub systemu Mandriv'y ?)

CytatMoże pamiętasz jaką wydajność w FPU-32 pokazał Ci test?
Cytat[1: FPU-32] Initial test
   Calculations... Please wait...
   Execution time: 21 seconds
   Rate: 4,761905

[1: FPU-32] Final test
   Calculations... Please wait...
   FP Units Testing...
   Execution time: 103 seconds
        Special Units Testing...
/tmp/OCL3yOly3.cl; line 42:error: more than one instance of overloaded
function "pow" matches the argument list:
   function "pow(float, float) C++"
   function "pow(double, double) C++"
   argument types are: (float, double)
tmp -=pow(tmp, -.5);  //4ops

Internal error: clc compiler invocation failed.
ERROR ......CL_INVALID_KERNEL ...... (nieczytelne zdjęcie)

   Performance: [32-bit float types]
      423.267122 GFLOPS
      117028500000 special operations (e.g. sin, cos) per second
dane pogrubione naniesione ręcznie możliwe literówki
zwróć uwagę na błędy przy benchmarku 32bit, których nie masz na NVIDIi

przy teście nVidia Tesla C2050 masz
CytatERROR: CL_INVALID_BINARY   Execution time: 438 seconds

Radeon HD 7770 ma dość słabe DP może za krótko czekałem,
czas testu po poborze mocy kreśliłem podobnie ~7 min (420sek)
ale w sumie jeszcze kilkanaście minut gdy był zmniejszony pobór mocy przez GPU

O kurcze faktycznie tak jest problem z przeciążoną funkcją. Zaraz to poprawię.

A próbowałeś coś z tym Shor'em na float?

Cytat: RAD-Poland w 08 Sierpień 2012, 20:51
przy teście nVidia Tesla C2050 masz
CytatERROR: CL_INVALID_BINARY   Execution time: 438 seconds
Szukałem rozwiązania dlaczego tak u mnie jest. Okazało się, że najprawdopodobniej jest to spowodowane starą wersją OpenCL.

Ja mam jedynie:

--- 1 OpenCL platform(s) found: -----------------------------------------
  -- 0 --
  VERSION = OpenCL 1.0 CUDA 3.2.1
  VENDOR = NVIDIA Corporation
  EXTENSIONS = cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll

W ciągu 30 minut dodam do repozytorium nową rewizję z dodatkową pragmą dla AMD to zobaczymy czy to coś zmieni.


- w/w testy dla v32
- stery 12.4 wg CCC instalowane z amd-driver-installer-12-6-x86.x86_64.run
- tyle restartów tzn 5 nie miałem przez ostatnie 2 lata  (była okazja na odkurzenie budy)  ;)

CytatA próbowałeś coś z tym Shor'em na float?
jeszcze nie,
kompiluje się, -> długa nie wklejam tych kilometrów
po kompilacji
bash-4.2$ ./shor_algorithm_ocl_float
Naruszenie ochrony pamięci

bash-4.2$ make shor_algorithm_ocl_double
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include shor_algorithm.cpp -o shor_algorithm_ocl_core_double.o -D USE_OPENCL -D DOUBLE_PRECISION
shor_algorithm.cpp: In function 'int main(int, char**)':
shor_algorithm.cpp:502:53: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
shor_algorithm.cpp:617:164: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:1091:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
shor_algorithm.cpp:1206:173: warning: format '%lld' expects argument of type 'long long int', but argument 6 has type 'double' [-Wformat]
shor_algorithm.cpp:1206:173: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:1210:162: warning: too many arguments for format [-Wformat-extra-args]

shor_algorithm.cpp:698: note: not vectorized: control flow in loop.
shor_algorithm.cpp:663: note: not vectorized: control flow in loop.
shor_algorithm.cpp:502: note: not vectorized: control flow in loop.
shor_algorithm.cpp:489: note: not vectorized: control flow in loop.
shor_algorithm.cpp:436: note: not vectorized: control flow in loop.
shor_algorithm.cpp:368: note: not vectorized: control flow in loop.
shor_algorithm.cpp:214: note: vectorized 0 loops in function.
shor_algorithm.cpp:1140:3: warning: 'factor' may be used uninitialized in this function [-Wuninitialized]
g++ -Wall shor_algorithm_ocl_core_double.o olib_primitives.o olib_linear_algebra_sp.o olib_quantum_sp.o olib_quantum_ocl.o olib_discrete_sp.o olib_primitives_ocl.o olib_strings_sp.o -o shor_algorithm_ocl_double -lOpenCL

bash-4.2$ ./shor_algorithm_ocl_double
Naruszenie ochrony pamięci


PG:         YOYO:



Cytat: RAD-Poland w 08 Sierpień 2012, 21:11
- jak system wytrzyma to zaraz wkleję double....................

bash-4.2$ ./shor_algorithm_ocl_float
Naruszenie ochrony pamięci

Odpowiedz na to:

ulimit -s unlimited


bash-4.2$ ulimit -s unlimited
bash-4.2$ ./shor_algorithm_ocl_float
0 1 6 1.98 1.19 0

bash-4.2$ ulimit -s unlimited
bash-4.2$ ./shor_algorithm_ocl_double
"/tmp/OCLLqoikk.cl", line 28: warning: ignore unrecognized OpenCL extension
  #pragma OPENCL EXTENSION cl_khr_fp64 : enable

"/tmp/OCLLqoikk.cl", line 29: error: can't enable all OpenCL extensions or
          unrecognized OpenCL extension
  #pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable

"/tmp/OCLLqoikk.cl", line 46: error: function "atom_cmpxchg" declared
        while ((readback = atom_cmpxchg((__global long*)address, oldval, newval)) != oldval)

2 errors detected in the compilation of "/tmp/OCLLqoikk.cl".

Internal error: clc compiler invocation failed.



EDIT: zwątpiłem czy 7770 ma DP ale puściłem MilyWay i liczy

PG:         YOYO:



Cytat: RAD-Poland w 08 Sierpień 2012, 21:24
bash-4.2$ ulimit -s unlimited
bash-4.2$ ./shor_algorithm_ocl_float
0 1 6 1.98 1.19 0

I gites. Tak powinno być. Jeden wynik jednak niczego mi nie mówi. Jutro podeślę skrypcik w bashu co będzie w pętli to liczył i zbierał wyniki.

Cytat: RAD-Poland w 08 Sierpień 2012, 21:24
bash-4.2$ ulimit -s unlimited
bash-4.2$ ./shor_algorithm_ocl_double
"/tmp/OCLLqoikk.cl", line 28: warning: ignore unrecognized OpenCL extension
  #pragma OPENCL EXTENSION cl_khr_fp64 : enable

"/tmp/OCLLqoikk.cl", line 29: error: can't enable all OpenCL extensions or
          unrecognized OpenCL extension
  #pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable

"/tmp/OCLLqoikk.cl", line 46: error: function "atom_cmpxchg" declared
        while ((readback = atom_cmpxchg((__global long*)address, oldval, newval)) != oldval)

2 errors detected in the compilation of "/tmp/OCLLqoikk.cl".

Internal error: clc compiler invocation failed.



No tak... Można było się domyślić...

W repozytorium jest rewizja 33.

Na poczatek przetestuj Shora na double. Potem zapodaj benchmarka. Zobaczymy czy teraz łyknie.


- test v33
- dodanie -I/opt/AMDAPP/include
bash-4.2$ ulimit -s unlimited
bash-4.2$ ./shor_algorithm_ocl_float
0 1 4 1.92 1.15 0

bash-4.2$ ./shor_algorithm_ocl_double
"/tmp/OCLHLI142.cl", line 29: warning: ignore unrecognized OpenCL extension
      #pragma OPENCL EXTENSION cl_khr_fp64 : enable
"/tmp/OCLHLI142.cl", line 36: error: can't enable all OpenCL extensions or
          unrecognized OpenCL extension
  #pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
"/tmp/OCLHLI142.cl", line 53: error: function "atom_cmpxchg" declared
        while ((readback = atom_cmpxchg((__global long*)address, oldval, newval)) != oldval)
2 errors detected in the compilation of "/tmp/OCLHLI142.cl".

Internal error: clc compiler invocation failed.


niestety  :(

EDIT benchmarka nie puszczałem -> jesteś pewien ?

PG:         YOYO:



A benchmark?

Cytat: RAD-Poland w 08 Sierpień 2012, 21:42
EDIT benchmarka nie puszczałem -> jesteś pewien ?
Nie  :deadman:

Ale tylko w jeden sposób można się przekonać. Jednak zapodaj go jak nie będziesz niczego na PC robić  :attack:


- PC zawsze coś robi, skrypty php itd
- BOINCa wyłączam CPU i GPU
- kompilacja
bash-4.2$ make benchmark_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include benchmark.cpp -o benchmark_core_ocl.o -D USE_OPENCL
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../opencl/olib_benchmark_ocl.cpp -lOpenCL
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_fp_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:41:110: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_fp_units_ocl(double*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:173:111: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_special_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:306:115: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_special_units_ocl(double*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:439:116: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_mips_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:575:106: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
g++ -Wall benchmark_core_ocl.o olib_primitives_ocl.o olib_strings_sp.o olib_benchmark_ocl.o -o benchmark_ocl -lOpenCL

- najwyżej będzie kolejny restart  :cold:

EDIT: niestety

32bit initial 21sek/4,761905
32bit final 103sek/151sek/423.267122/775023178
64bit final  :cold: od 20 min -> OT: idę go zrestartować bo nie będzie statów PM (już są braki w Rosettce, a zachwilę SETI i SETIbeta będzie mulił)

EDIT: pomału do przodu, widać poprawę w drugim teście FPU32
Cytat[1: FPU-32] Initial test
   Calculations... Please wait...
   Execution time: 21 seconds
   Rate: 4,761905

[1: FPU-32] Final test
   Calculations... Please wait...
   FP Units Testing...
   Execution time: 103 seconds
   Special Units Testing...
   Execution time: 151 seconds
   Performance: [32-bit float types]
      423.267122 GFLOPS
      775023178 special operations (e.g. sin, cos) per second

Jutro stworzę skrypcik do testowania. Ile czasu GPU dasz radę wyłuskać?  :attack:


postaram się trochę czasu zarezerwować 1-2h
daj znać jak będziesz miał nową wersję (będę zerkać na RSS/Forum)
skrypt by się przydał i jeśli była by możliwość w aplikacji stworzenie pliku wyjściowego, bardzo przydatny przy zawieszeniu
przechwytywanie konsoli coś mi nie wyszło, przy zawieszeniu pusty plik, a tak byłoby łatwiej i szybciej przekazać informacje, niż przy przepisywaniu
często benchmarki OC tworzą takie pliki i wiadomo w którym miejscu i z jakiego powodu komp zaliczył pada

Tak. Bedzie tworzył pliki tymczasowe. Nie powinien jednak zaliczyć zwiechy - ograniczymy się do typu float, a na nim wszystko ładnie działa  :attack:

Martin Fox

fox@APU:~/test/trunk/samples$ make benchmark_sp
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona benchmark.cpp -o benchmark_core_sp.o -D USE_SEQUENTIAL
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../sp/olib_benchmark_sp.cpp -o olib_benchmark_sp.o
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_fp_units_sp(float*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:46:9: warning: variable 'a' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp:49:9: warning: variable 'b' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_fp_units_sp(double*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:530:9: warning: variable 'a' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp:533:9: warning: variable 'b' set but not used [-Wunused-but-set-variable]

../sp/olib_benchmark_sp.cpp:38: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:52: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:33: note: vectorized 0 loops in function.

../sp/olib_benchmark_sp.cpp:522: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:536: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:517: note: vectorized 0 loops in function.

../sp/olib_benchmark_sp.cpp:1007: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1014: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1001: note: vectorized 0 loops in function.
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_special_units_sp(float*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:1017:18: warning: 'tmp' may be used uninitialized in this function [-Wuninitialized]

../sp/olib_benchmark_sp.cpp:1037: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1044: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1031: note: vectorized 0 loops in function.
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_special_units_sp(double*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:1047:19: warning: 'tmp' may be used uninitialized in this function [-Wuninitialized]

../sp/olib_benchmark_sp.cpp:1072: note: not vectorized: multiple nested loops.
../sp/olib_benchmark_sp.cpp:1075: note: not vectorized: number of iterations cannot be computed.
../sp/olib_benchmark_sp.cpp:1075: note: not vectorized: Bad inner loop.
../sp/olib_benchmark_sp.cpp:1079: note: not vectorized: number of iterations cannot be computed.
../sp/olib_benchmark_sp.cpp:1061: note: vectorized 0 loops in function.
g++ -Wall benchmark_core_sp.o olib_benchmark_sp.o -o benchmark_sp

fox@APU:~/test/trunk/samples$ make benchmark_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona benchmark.cpp -o benchmark_core_ocl.o -D USE_OPENCL
benchmark.cpp:69:21: fatal error: CL/cl.h: No such file or directory
compilation terminated.
make: *** [benchmark_core_ocl.o] Error 1

fox@APU:~/test/trunk/samples$ ./benchmark_sp
OLib Benchmark.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: Sequential
Compile date: Aug  9 2012
Compile time: 09:52:49
Architecture: x86_64

[1: FPU-32] Initial test
Calculations... Please wait...
Execution time: 46 seconds
Rate: 2.173913

[1: FPU-32] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 101 seconds
Special Units Testing...
Execution time: 49 seconds
Performance: [32-bit float types]
0.384516 GFLOPS
2127551 special operations (e.g. sin, cos) per second

[1: FPU-64] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 118 seconds
Special Units Testing...
Execution time: 41 seconds
Performance: [64-bit float types]
0.030781 GFLOPS
237804 special operations (e.g. sin, cos) per second

[2: ALU-64] Final test
Calculations... Please wait...
Execution time: 31 seconds
Performance: [64-bit integer types]
9765.870981 MIPS


Może na początek skompiluj shor_algorithm_ocl_float

Do linijki numer 198 w pliku makefile dodaj po spacji -I/opt/AMDAPP/include

i potem normalnie

make shor_algorithm_ocl_float


Można także pobrać rewizję 34 w której problem kompilacji na AMD chyba jest poprawiony.


v34 :no:

bash-4.2$ make benchmark_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona benchmark.cpp -o benchmark_core_ocl.o -D USE_OPENCL
benchmark.cpp:69:21: fatal error: CL/cl.h: Nie ma takiego pliku ani katalogu
compilation terminated.

make shor_algorithm_ocl_double
make shor_algorithm_ocl_float
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../opencl/olib_quantum_ocl.cpp -lOpenCL
../opencl/olib_quantum_ocl.cpp:31:20: fatal error: CL/cl.h: Nie ma takiego pliku ani katalogu
compilation terminated.

Eh. Właśnie widzę, że make źle poprawiłem. Kaszanę niezłą zrobiłem. Potem to poprawię (za kilka godzin jak wrócę) - teraz już nie dam rady.

W repozytorium jest już plik do testów 'shor_algorithm_tester.sh' ale wkradł się tam drobny problem, którego w repo poprawionego nie ma.

Odpaliłem testy u siebie na Tesli C2050 aby wiedzieć ile mogą się one na oko wykonywać na innych kartach.

Pod wieczór napiszę dokładniej co z tego wychodzi.


A teraz komplacja przechodzi? (Rewizja 36)



make shor_algorithm_ocl_double OK
make shor_algorithm_ocl_float OK

make benchmark_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona benchmark.cpp -o benchmark_core_ocl.o -D USE_OPENCL
benchmark.cpp:69:21: fatal error: CL/cl.h: Nie ma takiego pliku ani katalogu
compilation terminated.

Rew. 37.

Można sprawdzić:

burner_ocl (może nie uruchamiać)



make shor_algorithm_ocl_double OK
make shor_algorithm_ocl_float OK
make benchmark_ocl OK

make burner_ocl OK
bash-4.2$ make burner_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include burner.cpp -o burner_core_ocl.o -D USE_OPENCL
burner.cpp:226: note: not vectorized: multiple exits.
burner.cpp:118: note: vectorized 0 loops in function.
g++ -Wall burner_core_ocl.o olib_primitives_ocl.o olib_strings_sp.o olib_benchmark_ocl.o -o burner_ocl -lOpenCL

EDIT: ??
Cytatburner_ocl (może nie uruchamiać)

? - może się nie uruchamiać
? - może lepiej nie uruchamiać bo coś ...

bash-4.2$ ./burner_ocl
OLib Burner.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: OpenCL
Compile date: Aug  9 2012
Compile time: 18:50:51
Architecture: x86_64

--- 1 OpenCL platform(s) found: -----------------------------------------
  -- 0 --
  VERSION = OpenCL 1.2 AMD-APP (938.1)
  NAME = AMD Accelerated Parallel Processing
  VENDOR = Advanced Micro Devices, Inc.
  EXTENSIONS = cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
--- 1 OpenCL device(s) found on platform: -----------------------------------------

  ------------------- 0 -------------------
  DEVICE_NAME = Capeverde
  DEVICE_VENDOR = Advanced Micro Devices, Inc.
  DEVICE_VERSION = OpenCL 1.2 AMD-APP (938.1)
  DRIVER_VERSION = CAL 1.4.1741 (VM)
  CL_DEVICE_EXTENSIONS = cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt

Enter the device number:

wykonać ./burner_ocl   ?

Cytat: RAD-Poland w 09 Sierpień 2012, 18:49
? - może lepiej nie uruchamiać bo coś ...
Może nie oddać sterowania do podsystemu graficznego  :attack: Nie ma po co ryzykować stabilności jak nie trzeba.

Martin Fox

rev 39
fox@APU:~/test/trunk/samples$ make benchmark_sp
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona benchmark.cpp -o benchmark_core_sp.o -D USE_SEQUENTIAL
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../sp/olib_benchmark_sp.cpp -o olib_benchmark_sp.o
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_fp_units_sp(float*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:46:9: warning: variable 'a' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp:49:9: warning: variable 'b' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_fp_units_sp(double*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:530:9: warning: variable 'a' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp:533:9: warning: variable 'b' set but not used [-Wunused-but-set-variable]

../sp/olib_benchmark_sp.cpp:38: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:52: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:33: note: vectorized 0 loops in function.

../sp/olib_benchmark_sp.cpp:522: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:536: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:517: note: vectorized 0 loops in function.

../sp/olib_benchmark_sp.cpp:1007: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1014: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1001: note: vectorized 0 loops in function.
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_special_units_sp(float*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:1017:18: warning: 'tmp' may be used uninitialized in this function [-Wuninitialized]

../sp/olib_benchmark_sp.cpp:1037: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1044: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1031: note: vectorized 0 loops in function.
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_special_units_sp(double*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:1047:19: warning: 'tmp' may be used uninitialized in this function [-Wuninitialized]

../sp/olib_benchmark_sp.cpp:1072: note: not vectorized: multiple nested loops.
../sp/olib_benchmark_sp.cpp:1075: note: not vectorized: number of iterations cannot be computed.
../sp/olib_benchmark_sp.cpp:1075: note: not vectorized: Bad inner loop.
../sp/olib_benchmark_sp.cpp:1079: note: not vectorized: number of iterations cannot be computed.
../sp/olib_benchmark_sp.cpp:1061: note: vectorized 0 loops in function.
g++ -Wall benchmark_core_sp.o olib_benchmark_sp.o -o benchmark_sp

fox@APU:~/test/trunk/samples$ make benchmark_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include benchmark.cpp -o benchmark_core_ocl.o -D USE_OPENCL
benchmark.cpp: In function 'int main(int, char**)':
benchmark.cpp:215:26: warning: ignoring return value of 'int scanf(const char*, ...)', declared with attribute warn_unused_result [-Wunused-result]
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../primitives/olib_primitives_ocl.cpp
../primitives/olib_primitives_ocl.cpp: In function 'int support_fp64(int)':
../primitives/olib_primitives_ocl.cpp:79:84: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../primitives/olib_primitives_ocl.cpp: In function 'int search_devices(_cl_device_id**, cl_uint*)':
../primitives/olib_primitives_ocl.cpp:102:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
../primitives/olib_primitives_ocl.cpp:131:19: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
../primitives/olib_primitives_ocl.cpp: In function 'char* olib_read_opencl_kernel(char*)':
../primitives/olib_primitives_ocl.cpp:238:21: warning: variable 'readlen' set but not used [-Wunused-but-set-variable]

/usr/include/x86_64-linux-gnu/bits/stdio2.h:105: note: not vectorized: control flow in loop.
/usr/include/x86_64-linux-gnu/bits/stdio2.h:105: note: not vectorized: control flow in loop.
../primitives/olib_primitives_ocl.cpp:94: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../sp/olib_strings_sp.cpp

../sp/olib_strings_sp.cpp:71: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:71: note: not vectorized: Bad inner loop.
../sp/olib_strings_sp.cpp:74: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:48: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:48: note: not vectorized: Bad inner loop.
../sp/olib_strings_sp.cpp:52: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:35: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../opencl/olib_benchmark_ocl.cpp
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_fp_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:41:110: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_fp_units_ocl(double*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:173:111: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_special_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:306:115: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_special_units_ocl(double*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:439:116: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_mips_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:575:106: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
g++ -Wall benchmark_core_ocl.o olib_primitives_ocl.o olib_strings_sp.o olib_benchmark_ocl.o -o benchmark_ocl -lOpenCL

fox@APU:~/test/trunk/samples$ make burner_sp
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona burner.cpp -o burner_core_sp.o -D USE_SEQUENTIAL

/usr/include/x86_64-linux-gnu/bits/stdio2.h:105: note: not vectorized: multiple exits.
burner.cpp:118: note: vectorized 0 loops in function.
g++ -Wall burner_core_sp.o olib_benchmark_sp.o -o burner_sp
fox@APU:~/test/trunk/samples$ make burner_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include burner.cpp -o burner_core_ocl.o -D USE_OPENCL
burner.cpp: In function 'int main(int, char**)':
burner.cpp:171:26: warning: ignoring return value of 'int scanf(const char*, ...)', declared with attribute warn_unused_result [-Wunused-result]

/usr/include/x86_64-linux-gnu/bits/stdio2.h:105: note: not vectorized: multiple exits.
burner.cpp:118: note: vectorized 0 loops in function.
g++ -Wall burner_core_ocl.o olib_primitives_ocl.o olib_strings_sp.o olib_benchmark_ocl.o -o burner_ocl -lOpenCL

fox@APU:~/test/trunk/samples$ make shor_algorithm_ocl_float
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona shor_algorithm.cpp -o shor_algorithm_ocl_core_float.o -D USE_OPENCL -D FLOAT_PRECISION
shor_algorithm.cpp: In function 'int main(int, char**)':
shor_algorithm.cpp:518:53: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
shor_algorithm.cpp:633:164: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:1107:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
shor_algorithm.cpp:1222:173: warning: format '%lld' expects argument of type 'long long int', but argument 6 has type 'double' [-Wformat]
shor_algorithm.cpp:1222:173: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:1226:162: warning: too many arguments for format [-Wformat-extra-args]

shor_algorithm.cpp:714: note: Detected interleaving *collapse.9_140[i_2].imag and *collapse.9_140[i_2].real
shor_algorithm.cpp:714: note: Vectorizing an unaligned access.
shor_algorithm.cpp:714: note: Vectorizing an unaligned access.
shor_algorithm.cpp:714: note: vect_model_store_cost: unaligned supported by hardware.
shor_algorithm.cpp:714: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
shor_algorithm.cpp:714: note: vect_model_load_cost: unaligned supported by hardware.
shor_algorithm.cpp:714: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
shor_algorithm.cpp:714: note: not vectorized: relevant stmt not supported: cstore.35_400 = [cond_expr] value_242 == D.25145_245 ? 1.0e+0 : 0.0;

shor_algorithm.cpp:679: note: not vectorized: control flow in loop.
shor_algorithm.cpp:518: note: not vectorized: control flow in loop.
shor_algorithm.cpp:505: note: not vectorized: control flow in loop.
shor_algorithm.cpp:452: note: not vectorized: control flow in loop.
shor_algorithm.cpp:384: note: not vectorized: control flow in loop.
shor_algorithm.cpp:214: note: vectorized 0 loops in function.
shor_algorithm.cpp:1107:3: warning: 'factor' may be used uninitialized in this function [-Wuninitialized]
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../primitives/olib_primitives.cpp

../primitives/olib_primitives.cpp:266: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../primitives/olib_primitives.cpp:261: note: vectorized 0 loops in function.

../primitives/olib_primitives.cpp:282: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../primitives/olib_primitives.cpp:285: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../primitives/olib_primitives.cpp:276: note: vectorized 0 loops in function.

../primitives/olib_primitives.cpp:305: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../primitives/olib_primitives.cpp:308: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../primitives/olib_primitives.cpp:299: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../sp/olib_linear_algebra_sp.cpp

../sp/olib_linear_algebra_sp.cpp:63: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:63: note: dependence distance == 0 between D.25881_10->imag and D.25881_10->imag
../sp/olib_linear_algebra_sp.cpp:63: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:63: note: accesses have the same alignment.
../sp/olib_linear_algebra_sp.cpp:63: note: dependence distance modulo vf == 0 between D.25881_10->imag and D.25881_10->imag
../sp/olib_linear_algebra_sp.cpp:63: note: Detected single element interleaving D.25881_10->imag step 8
../sp/olib_linear_algebra_sp.cpp:63: note: not vectorized: complicated access pattern.
../sp/olib_linear_algebra_sp.cpp:56: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:77: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:77: note: dependence distance == 0 between D.25872_10->imag and D.25872_10->imag
../sp/olib_linear_algebra_sp.cpp:77: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:77: note: accesses have the same alignment.
../sp/olib_linear_algebra_sp.cpp:77: note: dependence distance modulo vf == 0 between D.25872_10->imag and D.25872_10->imag
../sp/olib_linear_algebra_sp.cpp:77: note: Detected single element interleaving D.25872_10->imag step 16
../sp/olib_linear_algebra_sp.cpp:77: note: not vectorized: complicated access pattern.
../sp/olib_linear_algebra_sp.cpp:70: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:91: note: not vectorized: no vectype for stmt: D.25864_11 = D.25863_10->imag;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:84: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:129: note: not vectorized: inner-loop count not invariant.
../sp/olib_linear_algebra_sp.cpp:131: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:119: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:152: note: not vectorized: inner-loop count not invariant.
../sp/olib_linear_algebra_sp.cpp:154: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:142: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:175: note: not vectorized: inner-loop count not invariant.
../sp/olib_linear_algebra_sp.cpp:177: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:165: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:225: note: not vectorized: data ref analysis failed D.25790_18->real = 1.0e+0;

../sp/olib_linear_algebra_sp.cpp:216: note: Detected interleaving D.25782_9->real and D.25782_9->imag
../sp/olib_linear_algebra_sp.cpp:216: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:216: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:216: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../sp/olib_linear_algebra_sp.cpp:216: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:216: note: Cost model analysis:
  Vector inside of loop cost: 2
  Vector outside of loop cost: 9
  Scalar iteration cost: 2
  Scalar outside cost: 6
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 3

../sp/olib_linear_algebra_sp.cpp:216: note:   Profitability threshold = 2

../sp/olib_linear_algebra_sp.cpp:216: note: Profitability threshold is 2 loop iterations.
../sp/olib_linear_algebra_sp.cpp:216: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:216: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:209: note: vectorized 1 loops in function.

../sp/olib_linear_algebra_sp.cpp:248: note: not vectorized: data ref analysis failed D.25775_18->real = 1.0e+0;

../sp/olib_linear_algebra_sp.cpp:239: note: Detected interleaving D.25767_9->real and D.25767_9->imag
../sp/olib_linear_algebra_sp.cpp:239: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:239: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:239: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../sp/olib_linear_algebra_sp.cpp:239: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:239: note: cost model: the vector iteration cost = 2 divided by the scalar iteration cost = 2 is greater or equal to the vectorization factor = 1.
../sp/olib_linear_algebra_sp.cpp:239: note: not vectorized: vectorization not profitable.
../sp/olib_linear_algebra_sp.cpp:232: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:271: note: not vectorized: data ref analysis failed D.25760_18->real = 1.0e+0;

../sp/olib_linear_algebra_sp.cpp:262: note: not vectorized: no vectype for stmt: D.25752_9->real = 0.0;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:255: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:311: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:304: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:326: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:319: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:341: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:334: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:380: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:373: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:395: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:388: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:410: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:403: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:460: note: not vectorized: multiple nested loops.
../sp/olib_linear_algebra_sp.cpp:462: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:469: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:442: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:517: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:508: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:534: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:525: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:551: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:542: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:597: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:581: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:622: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:606: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:646: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:630: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:687: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:680: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:704: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:697: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:719: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:712: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:770: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:772: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:760: note: Detected interleaving D.25501_9->real and D.25501_9->imag
../sp/olib_linear_algebra_sp.cpp:760: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:760: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:760: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../sp/olib_linear_algebra_sp.cpp:760: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:760: note: Cost model analysis:
  Vector inside of loop cost: 2
  Vector outside of loop cost: 9
  Scalar iteration cost: 2
  Scalar outside cost: 6
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 3

../sp/olib_linear_algebra_sp.cpp:760: note:   Profitability threshold = 2

../sp/olib_linear_algebra_sp.cpp:760: note: Profitability threshold is 2 loop iterations.
../sp/olib_linear_algebra_sp.cpp:760: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:760: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:753: note: vectorized 1 loops in function.

../sp/olib_linear_algebra_sp.cpp:802: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:804: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:792: note: Detected interleaving D.25479_9->real and D.25479_9->imag
../sp/olib_linear_algebra_sp.cpp:792: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:792: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:792: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../sp/olib_linear_algebra_sp.cpp:792: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:792: note: cost model: the vector iteration cost = 2 divided by the scalar iteration cost = 2 is greater or equal to the vectorization factor = 1.
../sp/olib_linear_algebra_sp.cpp:792: note: not vectorized: vectorization not profitable.
../sp/olib_linear_algebra_sp.cpp:785: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:833: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:835: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:823: note: not vectorized: no vectype for stmt: D.25457_9->real = 0.0;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:816: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:880: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:882: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:873: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:900: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:902: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:893: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:920: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:922: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:913: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance == 0 between D.25381_28->real and D.25381_28->real
../sp/olib_linear_algebra_sp.cpp:949: note: Detected interleaving D.25381_28->real and D.25381_28->imag
../sp/olib_linear_algebra_sp.cpp:949: note: Detected interleaving D.25381_28->real and D.25381_28->imag
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance == 0 between D.25381_28->imag and D.25381_28->imag
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:949: note: accesses have the same alignment.
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance modulo vf == 0 between D.25381_28->real and D.25381_28->real
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:949: note: accesses have the same alignment.
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance modulo vf == 0 between D.25381_28->imag and D.25381_28->imag
../sp/olib_linear_algebra_sp.cpp:949: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:949: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:949: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:949: note: Cost model analysis:
  Vector inside of loop cost: 7
  Vector outside of loop cost: 16
  Scalar iteration cost: 10
  Scalar outside cost: 6
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 2

../sp/olib_linear_algebra_sp.cpp:949: note:   Profitability threshold = 1

../sp/olib_linear_algebra_sp.cpp:949: note: Profitability threshold is 1 loop iterations.
../sp/olib_linear_algebra_sp.cpp:949: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:949: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:942: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:935: note: vectorized 1 loops in function.

../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance == 0 between D.25363_26->real and D.25363_26->real
../sp/olib_linear_algebra_sp.cpp:971: note: Detected interleaving D.25363_26->real and D.25363_26->imag
../sp/olib_linear_algebra_sp.cpp:971: note: Detected interleaving D.25363_26->real and D.25363_26->imag
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance == 0 between D.25363_26->imag and D.25363_26->imag
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:971: note: accesses have the same alignment.
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance modulo vf == 0 between D.25363_26->real and D.25363_26->real
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:971: note: accesses have the same alignment.
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance modulo vf == 0 between D.25363_26->imag and D.25363_26->imag
../sp/olib_linear_algebra_sp.cpp:971: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:971: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:971: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:971: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:971: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:971: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:971: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:971: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:971: note: Cost model analysis:
  Vector inside of loop cost: 5
  Vector outside of loop cost: 6
  Scalar iteration cost: 6
  Scalar outside cost: 6
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 1

../sp/olib_linear_algebra_sp.cpp:971: note:   Profitability threshold = 0

../sp/olib_linear_algebra_sp.cpp:971: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:971: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:964: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:957: note: vectorized 1 loops in function.

../sp/olib_linear_algebra_sp.cpp:993: note: not vectorized: no vectype for stmt: D.25341_32 = D.25340_31->real;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:986: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:979: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:1008: note: versioning for alias required: can't determine dependence between D.25331_10->real and D.25330_7->real
../sp/olib_linear_algebra_sp.cpp:1008: note: mark for run-time aliasing test between D.25331_10->real and D.25330_7->real
../sp/olib_linear_algebra_sp.cpp:1008: note: Detected interleaving D.25331_10->real and D.25331_10->imag
../sp/olib_linear_algebra_sp.cpp:1008: note: versioning for alias required: can't determine dependence between D.25331_10->real and D.25330_7->imag
../sp/olib_linear_algebra_sp.cpp:1008: note: mark for run-time aliasing test between D.25331_10->real and D.25330_7->imag
../sp/olib_linear_algebra_sp.cpp:1008: note: Detected interleaving D.25330_7->real and D.25330_7->imag
../sp/olib_linear_algebra_sp.cpp:1008: note: versioning for alias required: can't determine dependence between D.25331_10->imag and D.25330_7->imag
../sp/olib_linear_algebra_sp.cpp:1008: note: mark for run-time aliasing test between D.25331_10->imag and D.25330_7->imag
../sp/olib_linear_algebra_sp.cpp:1008: note: found equal ranges D.25331_10->real, D.25330_7->imag and D.25331_10->real, D.25330_7->real
../sp/olib_linear_algebra_sp.cpp:1008: note: found equal ranges D.25331_10->imag, D.25330_7->imag and D.25331_10->real, D.25330_7->real
../sp/olib_linear_algebra_sp.cpp:1008: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1008: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1008: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1008: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1008: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1008: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1008: note: cost model: Adding cost of checks for loop versioning aliasing.

../sp/olib_linear_algebra_sp.cpp:1008: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:1008: note: Cost model analysis:
  Vector inside of loop cost: 4
  Vector outside of loop cost: 14
  Scalar iteration cost: 4
  Scalar outside cost: 1
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 6

../sp/olib_linear_algebra_sp.cpp:1008: note:   Profitability threshold = 5

../sp/olib_linear_algebra_sp.cpp:1008: note: Profitability threshold is 5 loop iterations.
../sp/olib_linear_algebra_sp.cpp:1008: note: create runtime check for data references D.25331_10->real and D.25330_7->real
../sp/olib_linear_algebra_sp.cpp:1008: note: created 1 versioning for alias checks.

../sp/olib_linear_algebra_sp.cpp:1008: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:1008: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:1003: note: vectorized 1 loops in function.

../sp/olib_linear_algebra_sp.cpp:1021: note: versioning for alias required: can't determine dependence between D.25321_10->real and D.25320_7->real
../sp/olib_linear_algebra_sp.cpp:1021: note: mark for run-time aliasing test between D.25321_10->real and D.25320_7->real
../sp/olib_linear_algebra_sp.cpp:1021: note: Detected interleaving D.25321_10->real and D.25321_10->imag
../sp/olib_linear_algebra_sp.cpp:1021: note: versioning for alias required: can't determine dependence between D.25321_10->real and D.25320_7->imag
../sp/olib_linear_algebra_sp.cpp:1021: note: mark for run-time aliasing test between D.25321_10->real and D.25320_7->imag
../sp/olib_linear_algebra_sp.cpp:1021: note: Detected interleaving D.25320_7->real and D.25320_7->imag
../sp/olib_linear_algebra_sp.cpp:1021: note: versioning for alias required: can't determine dependence between D.25321_10->imag and D.25320_7->imag
../sp/olib_linear_algebra_sp.cpp:1021: note: mark for run-time aliasing test between D.25321_10->imag and D.25320_7->imag
../sp/olib_linear_algebra_sp.cpp:1021: note: found equal ranges D.25321_10->real, D.25320_7->imag and D.25321_10->real, D.25320_7->real
../sp/olib_linear_algebra_sp.cpp:1021: note: found equal ranges D.25321_10->imag, D.25320_7->imag and D.25321_10->real, D.25320_7->real
../sp/olib_linear_algebra_sp.cpp:1021: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1021: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1021: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1021: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1021: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1021: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1021: note: cost model: Adding cost of checks for loop versioning aliasing.

../sp/olib_linear_algebra_sp.cpp:1021: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:1021: note: cost model: the vector iteration cost = 4 divided by the scalar iteration cost = 4 is greater or equal to the vectorization factor = 1.
../sp/olib_linear_algebra_sp.cpp:1021: note: not vectorized: vectorization not profitable.
../sp/olib_linear_algebra_sp.cpp:1016: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:1034: note: not vectorized: no vectype for stmt: D.25312_11 = D.25311_10->real;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:1029: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:1047: note: versioning for alias required: can't determine dependence between D.25299_10->real and D.25297_7->real
../sp/olib_linear_algebra_sp.cpp:1047: note: mark for run-time aliasing test between D.25299_10->real and D.25297_7->real
../sp/olib_linear_algebra_sp.cpp:1047: note: Detected interleaving D.25299_10->real and D.25299_10->imag
../sp/olib_linear_algebra_sp.cpp:1047: note: versioning for alias required: can't determine dependence between D.25299_10->real and D.25297_7->imag
../sp/olib_linear_algebra_sp.cpp:1047: note: mark for run-time aliasing test between D.25299_10->real and D.25297_7->imag
../sp/olib_linear_algebra_sp.cpp:1047: note: Detected interleaving D.25297_7->real and D.25297_7->imag
../sp/olib_linear_algebra_sp.cpp:1047: note: versioning for alias required: can't determine dependence between D.25299_10->imag and D.25297_7->imag
../sp/olib_linear_algebra_sp.cpp:1047: note: mark for run-time aliasing test between D.25299_10->imag and D.25297_7->imag
../sp/olib_linear_algebra_sp.cpp:1047: note: found equal ranges D.25299_10->real, D.25297_7->imag and D.25299_10->real, D.25297_7->real
../sp/olib_linear_algebra_sp.cpp:1047: note: found equal ranges D.25299_10->imag, D.25297_7->imag and D.25299_10->real, D.25297_7->real
../sp/olib_linear_algebra_sp.cpp:1047: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1047: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1047: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1047: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1047: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1047: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1047: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1047: note: cost model: Adding cost of checks for loop versioning aliasing.

../sp/olib_linear_algebra_sp.cpp:1047: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:1047: note: Cost model analysis:
  Vector inside of loop cost: 5
  Vector outside of loop cost: 16
  Scalar iteration cost: 6
  Scalar outside cost: 1
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 4

../sp/olib_linear_algebra_sp.cpp:1047: note:   Profitability threshold = 3

../sp/olib_linear_algebra_sp.cpp:1047: note: Profitability threshold is 3 loop iterations.
../sp/olib_linear_algebra_sp.cpp:1047: note: create runtime check for data references D.25299_10->real and D.25297_7->real
../sp/olib_linear_algebra_sp.cpp:1047: note: created 1 versioning for alias checks.

../sp/olib_linear_algebra_sp.cpp:1047: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:1047: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:1042: note: vectorized 1 loops in function.

../sp/olib_linear_algebra_sp.cpp:1060: note: not vectorized: no vectype for stmt: D.25287_11 = D.25286_10->real;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:1055: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:1074: note: versioning for alias required: can't determine dependence between D.25273_10->real and D.25271_7->real
../sp/olib_linear_algebra_sp.cpp:1074: note: mark for run-time aliasing test between D.25273_10->real and D.25271_7->real
../sp/olib_linear_algebra_sp.cpp:1074: note: Detected interleaving D.25273_10->real and D.25273_10->imag
../sp/olib_linear_algebra_sp.cpp:1074: note: versioning for alias required: can't determine dependence between D.25273_10->real and D.25271_7->imag
../sp/olib_linear_algebra_sp.cpp:1074: note: mark for run-time aliasing test between D.25273_10->real and D.25271_7->imag
../sp/olib_linear_algebra_sp.cpp:1074: note: Detected interleaving D.25271_7->real and D.25271_7->imag
../sp/olib_linear_algebra_sp.cpp:1074: note: versioning for alias required: can't determine dependence between D.25273_10->imag and D.25271_7->imag
../sp/olib_linear_algebra_sp.cpp:1074: note: mark for run-time aliasing test between D.25273_10->imag and D.25271_7->imag
../sp/olib_linear_algebra_sp.cpp:1074: note: found equal ranges D.25273_10->real, D.25271_7->imag and D.25273_10->real, D.25271_7->real
../sp/olib_linear_algebra_sp.cpp:1074: note: found equal ranges D.25273_10->imag, D.25271_7->imag and D.25273_10->real, D.25271_7->real
../sp/olib_linear_algebra_sp.cpp:1074: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1074: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1074: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1074: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1074: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1074: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1074: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1074: note: cost model: Adding cost of checks for loop versioning aliasing.

../sp/olib_linear_algebra_sp.cpp:1074: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:1074: note: Cost model analysis:
  Vector inside of loop cost: 5
  Vector outside of loop cost: 16
  Scalar iteration cost: 6
  Scalar outside cost: 1
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 4

../sp/olib_linear_algebra_sp.cpp:1074: note:   Profitability threshold = 3

../sp/olib_linear_algebra_sp.cpp:1074: note: Profitability threshold is 3 loop iterations.
../sp/olib_linear_algebra_sp.cpp:1074: note: create runtime check for data references D.25273_10->real and D.25271_7->real
../sp/olib_linear_algebra_sp.cpp:1074: note: created 1 versioning for alias checks.

../sp/olib_linear_algebra_sp.cpp:1074: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:1074: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:1069: note: vectorized 1 loops in function.

../sp/olib_linear_algebra_sp.cpp:1088: note: not vectorized: no vectype for stmt: D.25261_11 = D.25260_10->real;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:1083: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:1101: note: not vectorized: no vectype for stmt: D.25245_7->real = D.25249_12;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:1096: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:1114: note: not vectorized: no vectype for stmt: D.25232_7->real = D.25236_12;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:1109: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../sp/olib_quantum_sp.cpp

../sp/olib_quantum_sp.cpp:231: note: not vectorized: number of iterations cannot be computed.
../sp/olib_quantum_sp.cpp:227: note: vectorized 0 loops in function.

../sp/olib_quantum_sp.cpp:246: note: not vectorized: unexpected loop form.
../sp/olib_quantum_sp.cpp:243: note: vectorized 0 loops in function.

../sp/olib_quantum_sp.cpp:270: note: not vectorized: number of iterations cannot be computed.
../sp/olib_quantum_sp.cpp:257: note: vectorized 0 loops in function.

../sp/olib_quantum_sp.cpp:289: note: not vectorized: unexpected loop form.
../sp/olib_quantum_sp.cpp:289: note: not vectorized: Bad inner loop.
../sp/olib_quantum_sp.cpp:297: note: not vectorized: unexpected loop form.
../sp/olib_quantum_sp.cpp:282: note: vectorized 0 loops in function.

../sp/olib_quantum_sp.cpp:325: note: versioning for alias required: can't determine dependence between D.24987_10->real and D.24986_7->real
../sp/olib_quantum_sp.cpp:325: note: mark for run-time aliasing test between D.24987_10->real and D.24986_7->real
../sp/olib_quantum_sp.cpp:325: note: Detected interleaving D.24987_10->real and D.24987_10->imag
../sp/olib_quantum_sp.cpp:325: note: versioning for alias required: can't determine dependence between D.24987_10->real and D.24986_7->imag
../sp/olib_quantum_sp.cpp:325: note: mark for run-time aliasing test between D.24987_10->real and D.24986_7->imag
../sp/olib_quantum_sp.cpp:325: note: Detected interleaving D.24986_7->real and D.24986_7->imag
../sp/olib_quantum_sp.cpp:325: note: versioning for alias required: can't determine dependence between D.24987_10->imag and D.24986_7->imag
../sp/olib_quantum_sp.cpp:325: note: mark for run-time aliasing test between D.24987_10->imag and D.24986_7->imag
../sp/olib_quantum_sp.cpp:325: note: found equal ranges D.24987_10->real, D.24986_7->imag and D.24987_10->real, D.24986_7->real
../sp/olib_quantum_sp.cpp:325: note: found equal ranges D.24987_10->imag, D.24986_7->imag and D.24987_10->real, D.24986_7->real
../sp/olib_quantum_sp.cpp:325: note: Vectorizing an unaligned access.
../sp/olib_quantum_sp.cpp:325: note: Vectorizing an unaligned access.
../sp/olib_quantum_sp.cpp:325: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_quantum_sp.cpp:325: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_quantum_sp.cpp:325: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_quantum_sp.cpp:325: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_quantum_sp.cpp:325: note: cost model: Adding cost of checks for loop versioning aliasing.

../sp/olib_quantum_sp.cpp:325: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_quantum_sp.cpp:325: note: cost model: the vector iteration cost = 4 divided by the scalar iteration cost = 4 is greater or equal to the vectorization factor = 1.
../sp/olib_quantum_sp.cpp:325: note: not vectorized: vectorization not profitable.
../sp/olib_quantum_sp.cpp:321: note: vectorized 0 loops in function.

../sp/olib_quantum_sp.cpp:188: note: not vectorized: control flow in loop.
../sp/olib_quantum_sp.cpp:188: note: not vectorized: Bad inner loop.
../sp/olib_quantum_sp.cpp:192: note: not vectorized: control flow in loop.
../sp/olib_quantum_sp.cpp:146: note: Detected interleaving *init.6_23[i_96].real and *init.6_23[i_96].imag
../sp/olib_quantum_sp.cpp:146: note: Vectorizing an unaligned access.
../sp/olib_quantum_sp.cpp:146: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_quantum_sp.cpp:146: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../sp/olib_quantum_sp.cpp:146: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_quantum_sp.cpp:146: note: cost model: the vector iteration cost = 2 divided by the scalar iteration cost = 2 is greater or equal to the vectorization factor = 1.
../sp/olib_quantum_sp.cpp:146: note: not vectorized: vectorization not profitable.
../sp/olib_quantum_sp.cpp:133: note: vectorized 0 loops in function.

../sp/olib_quantum_sp.cpp:337: note: versioning for alias required: can't determine dependence between D.24977_10->real and D.24976_7->real
../sp/olib_quantum_sp.cpp:337: note: mark for run-time aliasing test between D.24977_10->real and D.24976_7->real
../sp/olib_quantum_sp.cpp:337: note: Detected interleaving D.24977_10->real and D.24977_10->imag
../sp/olib_quantum_sp.cpp:337: note: versioning for alias required: can't determine dependence between D.24977_10->real and D.24976_7->imag
../sp/olib_quantum_sp.cpp:337: note: mark for run-time aliasing test between D.24977_10->real and D.24976_7->imag
../sp/olib_quantum_sp.cpp:337: note: Detected interleaving D.24976_7->real and D.24976_7->imag
../sp/olib_quantum_sp.cpp:337: note: versioning for alias required: can't determine dependence between D.24977_10->imag and D.24976_7->imag
../sp/olib_quantum_sp.cpp:337: note: mark for run-time aliasing test between D.24977_10->imag and D.24976_7->imag
../sp/olib_quantum_sp.cpp:337: note: found equal ranges D.24977_10->real, D.24976_7->imag and D.24977_10->real, D.24976_7->real
../sp/olib_quantum_sp.cpp:337: note: found equal ranges D.24977_10->imag, D.24976_7->imag and D.24977_10->real, D.24976_7->real
../sp/olib_quantum_sp.cpp:337: note: Vectorizing an unaligned access.
../sp/olib_quantum_sp.cpp:337: note: Vectorizing an unaligned access.
../sp/olib_quantum_sp.cpp:337: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_quantum_sp.cpp:337: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_quantum_sp.cpp:337: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_quantum_sp.cpp:337: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_quantum_sp.cpp:337: note: cost model: Adding cost of checks for loop versioning aliasing.

../sp/olib_quantum_sp.cpp:337: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_quantum_sp.cpp:337: note: Cost model analysis:
  Vector inside of loop cost: 4
  Vector outside of loop cost: 14
  Scalar iteration cost: 4
  Scalar outside cost: 1
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 6

../sp/olib_quantum_sp.cpp:337: note:   Profitability threshold = 5

../sp/olib_quantum_sp.cpp:337: note: Profitability threshold is 5 loop iterations.
../sp/olib_quantum_sp.cpp:337: note: create runtime check for data references D.24977_10->real and D.24976_7->real
../sp/olib_quantum_sp.cpp:337: note: created 1 versioning for alias checks.

../sp/olib_quantum_sp.cpp:337: note: vectorizing stmts using SLP.
../sp/olib_quantum_sp.cpp:337: note: LOOP VECTORIZED.
../sp/olib_quantum_sp.cpp:333: note: vectorized 1 loops in function.

../sp/olib_quantum_sp.cpp:95: note: not vectorized: control flow in loop.
../sp/olib_quantum_sp.cpp:95: note: not vectorized: Bad inner loop.
../sp/olib_quantum_sp.cpp:99: note: not vectorized: control flow in loop.
../sp/olib_quantum_sp.cpp:52: note: Detected interleaving *init.14_23[i_106].real and *init.14_23[i_106].imag
../sp/olib_quantum_sp.cpp:52: note: Vectorizing an unaligned access.
../sp/olib_quantum_sp.cpp:52: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_quantum_sp.cpp:52: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../sp/olib_quantum_sp.cpp:52: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_quantum_sp.cpp:52: note: Cost model analysis:
  Vector inside of loop cost: 2
  Vector outside of loop cost: 9
  Scalar iteration cost: 2
  Scalar outside cost: 6
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 3

../sp/olib_quantum_sp.cpp:52: note:   Profitability threshold = 2

../sp/olib_quantum_sp.cpp:52: note: Profitability threshold is 2 loop iterations.
../sp/olib_quantum_sp.cpp:52: note: vectorizing stmts using SLP.
../sp/olib_quantum_sp.cpp:52: note: LOOP VECTORIZED.
../sp/olib_quantum_sp.cpp:39: note: vectorized 1 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../opencl/olib_quantum_ocl.cpp
../opencl/olib_quantum_ocl.cpp: In function 'int dqft_ocl(complex_float*, long long unsigned int, int, int)':
../opencl/olib_quantum_ocl.cpp:62:97: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_quantum_ocl.cpp: In function 'int dqft_ocl(complex_double*, long long unsigned int, int, int)':
../opencl/olib_quantum_ocl.cpp:254:98: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]

../opencl/olib_quantum_ocl.cpp:77: note: not vectorized: control flow in loop.
../opencl/olib_quantum_ocl.cpp:55: note: Detected interleaving *init.21_23[i_60].real and *init.21_23[i_60].imag
../opencl/olib_quantum_ocl.cpp:55: note: Vectorizing an unaligned access.
../opencl/olib_quantum_ocl.cpp:55: note: vect_model_store_cost: unaligned supported by hardware.
../opencl/olib_quantum_ocl.cpp:55: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../opencl/olib_quantum_ocl.cpp:55: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../opencl/olib_quantum_ocl.cpp:55: note: Cost model analysis:
  Vector inside of loop cost: 2
  Vector outside of loop cost: 9
  Scalar iteration cost: 2
  Scalar outside cost: 6
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 3

../opencl/olib_quantum_ocl.cpp:55: note:   Profitability threshold = 2

../opencl/olib_quantum_ocl.cpp:55: note: Profitability threshold is 2 loop iterations.
../opencl/olib_quantum_ocl.cpp:55: note: vectorizing stmts using SLP.
../opencl/olib_quantum_ocl.cpp:55: note: LOOP VECTORIZED.
../opencl/olib_quantum_ocl.cpp:41: note: vectorized 1 loops in function.

../opencl/olib_quantum_ocl.cpp:269: note: not vectorized: control flow in loop.
../opencl/olib_quantum_ocl.cpp:244: note: Detected interleaving *init.2_23[i_48].real and *init.2_23[i_48].imag
../opencl/olib_quantum_ocl.cpp:244: note: Vectorizing an unaligned access.
../opencl/olib_quantum_ocl.cpp:244: note: vect_model_store_cost: unaligned supported by hardware.
../opencl/olib_quantum_ocl.cpp:244: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../opencl/olib_quantum_ocl.cpp:244: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../opencl/olib_quantum_ocl.cpp:244: note: cost model: the vector iteration cost = 2 divided by the scalar iteration cost = 2 is greater or equal to the vectorization factor = 1.
../opencl/olib_quantum_ocl.cpp:244: note: not vectorized: vectorization not profitable.
../opencl/olib_quantum_ocl.cpp:229: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../sp/olib_discrete_sp.cpp

../sp/olib_discrete_sp.cpp:64: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:57: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:106: note: not vectorized: number of iterations cannot be computed.
../sp/olib_discrete_sp.cpp:101: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:106: note: not vectorized: number of iterations cannot be computed.
../sp/olib_discrete_sp.cpp:139: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:181: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:168: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:220: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:217: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:293: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:252: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:319: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:310: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:305: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:359: note: not vectorized: number of iterations cannot be computed.
../sp/olib_discrete_sp.cpp:352: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:385: note: not vectorized: number of iterations cannot be computed.
../sp/olib_discrete_sp.cpp:378: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:411: note: not vectorized: number of iterations cannot be computed.
../sp/olib_discrete_sp.cpp:404: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:436: note: not vectorized: number of iterations cannot be computed.
../sp/olib_discrete_sp.cpp:429: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:483: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:474: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:508: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:499: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:533: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:524: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:558: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:549: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:587: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:578: note: vectorized 0 loops in function.
g++ -Wall shor_algorithm_ocl_core_float.o olib_primitives.o olib_linear_algebra_sp.o olib_quantum_sp.o olib_quantum_ocl.o olib_discrete_sp.o olib_primitives_ocl.o olib_strings_sp.o -o shor_algorithm_ocl_float -lOpenCL

fox@APU:~/test/trunk/samples$ make shor_algorithm_ocl_double
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona shor_algorithm.cpp -o shor_algorithm_ocl_core_double.o -D USE_OPENCL -D DOUBLE_PRECISION
shor_algorithm.cpp: In function 'int main(int, char**)':
shor_algorithm.cpp:518:53: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
shor_algorithm.cpp:633:164: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:1107:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
shor_algorithm.cpp:1222:173: warning: format '%lld' expects argument of type 'long long int', but argument 6 has type 'double' [-Wformat]
shor_algorithm.cpp:1222:173: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:1226:162: warning: too many arguments for format [-Wformat-extra-args]

shor_algorithm.cpp:714: note: Detected interleaving *collapse.9_140[i_2].imag and *collapse.9_140[i_2].real
shor_algorithm.cpp:714: note: Vectorizing an unaligned access.
shor_algorithm.cpp:714: note: Vectorizing an unaligned access.
shor_algorithm.cpp:714: note: vect_model_store_cost: unaligned supported by hardware.
shor_algorithm.cpp:714: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
shor_algorithm.cpp:714: note: vect_model_load_cost: unaligned supported by hardware.
shor_algorithm.cpp:714: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
shor_algorithm.cpp:714: note: not vectorized: relevant stmt not supported: cstore.33_397 = [cond_expr] value_242 == D.25144_245 ? 1.0e+0 : 0.0;

shor_algorithm.cpp:679: note: not vectorized: control flow in loop.
shor_algorithm.cpp:518: note: not vectorized: control flow in loop.
shor_algorithm.cpp:505: note: not vectorized: control flow in loop.
shor_algorithm.cpp:452: note: not vectorized: control flow in loop.
shor_algorithm.cpp:384: note: not vectorized: control flow in loop.
shor_algorithm.cpp:214: note: vectorized 0 loops in function.
shor_algorithm.cpp:1107:3: warning: 'factor' may be used uninitialized in this function [-Wuninitialized]
g++ -Wall shor_algorithm_ocl_core_double.o olib_primitives.o olib_linear_algebra_sp.o olib_quantum_sp.o olib_quantum_ocl.o olib_discrete_sp.o olib_primitives_ocl.o olib_strings_sp.o -o shor_algorithm_ocl_double -lOpenCL

Martin Fox

fox@APU:~/test/trunk/samples$ ./benchmark_sp
OLib Benchmark.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: Sequential
Compile date: Aug  9 2012
Compile time: 22:35:58
Architecture: x86_64

[1: FPU-32] Initial test
Calculations... Please wait...
Execution time: 47 seconds
Rate: 2.127660

[1: FPU-32] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 99 seconds
Special Units Testing...
Execution time: 49 seconds
Performance: [32-bit float types]
0.383818 GFLOPS
2081632 special operations (e.g. sin, cos) per second

[1: FPU-64] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 115 seconds
Special Units Testing...
Execution time: 42 seconds
Performance: [64-bit float types]
0.031584 GFLOPS
232142 special operations (e.g. sin, cos) per second

[2: ALU-64] Final test
Calculations... Please wait...
Execution time: 30 seconds
Performance: [64-bit integer types]
9873.600013 MIPS

fox@APU:~/test/trunk/samples$ ./benchmark_ocl
OLib Benchmark.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: OpenCL
Compile date: Aug  9 2012
Compile time: 22:36:28
Architecture: x86_64

--- 1 OpenCL platform(s) found: -----------------------------------------
  -- 0 --
  VERSION = OpenCL 1.2 AMD-APP (923.1)
  NAME = AMD Accelerated Parallel Processing
  VENDOR = Advanced Micro Devices, Inc.
  EXTENSIONS = cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
--- 1 OpenCL device(s) found on platform: -----------------------------------------

  ------------------- 0 -------------------
  DEVICE_NAME = BeaverCreek
  DEVICE_VENDOR = Advanced Micro Devices, Inc.
  DEVICE_VERSION = OpenCL 1.2 AMD-APP (923.1)
  CL_DEVICE_EXTENSIONS = cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt

Enter the device number:


wyniki 1000 iteracji z 5250 czas ~2h10
./shor_algorithm_tester.sh shor_algorithm_ocl_float Radeon7770

w załączniku temporary.zip

HD770 trochę za słaba do wykonania pełnego zadania  :(

przy kolejnym teście ujawnił się  :cold: problem (a noc była zimna  :P)

dzięki tworzonym plikom tymczasowym można wywnioskować,ze problemem nie był X-serwer
problem prawdopodobnie w sterownikach lub skrypcie nie wykluczam też hardware (leciwego)
skrypt wykonał 240 z 250 iteracji kolejnej liczby 411 w ~26min czyli dość szybko po czym zatrzymał się  :(
podczas testu nie zaobserwowałem przegrzewania się GPU ~55C a wentylatory przyspieszyły z 20-30% (XFXHD7770 Double Dissipation) ma czym chłodzić, do tego chłodne powietrze z zewnątrz
procesy uruchomione w tle pracowały bez problemu jeszcze 8h łącznie z BOINC Rosetta@home uruchomioną na 1 rdzeniu

może to problem tylko u mnie poczekajmy na wyniki pozostałych testerów
choć do tej pory generalnie nie było problemu z GPU  przy projektach które testowałem, tylko przy liczeniu PrimeGrid przycinał

~ v46

bash-4.2$ ./benchmark_ocl
OLib Benchmark.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: OpenCL
Compile date: Aug 10 2012
Compile time: 16:39:11
Architecture: x86_64

--- 1 OpenCL platform(s) found: -----------------------------------------
  -- 0 --
  VERSION = OpenCL 1.2 AMD-APP (938.1)
  NAME = AMD Accelerated Parallel Processing
  VENDOR = Advanced Micro Devices, Inc.
  EXTENSIONS = cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
--- 1 OpenCL device(s) found on platform: -----------------------------------------

  ------------------- 0 -------------------
  DEVICE_NAME = Capeverde
  DEVICE_VENDOR = Advanced Micro Devices, Inc.
  DEVICE_VERSION = OpenCL 1.2 AMD-APP (938.1)
  DRIVER_VERSION = CAL 1.4.1741 (VM)
  CL_DEVICE_EXTENSIONS = cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt

Enter the device number:
>> 0

[1: FPU-32] Initial test
        Calculations... Please wait...
        Execution time: 23 seconds
        Rate: 4.347826

[1: FPU-32] Final test
        Calculations... Please wait...
        FP Units Testing...
        Execution time: 96 seconds
        Special Units Testing...
        Execution time: 141 seconds
        Performance: [32-bit float types]
                414.639362 GFLOPS
                757813829 special operations (e.g. sin, cos) per second

[1: FPU-64] Final test
        Calculations... Please wait...
        FP Units Testing...
        Execution time: 405 seconds
        Special Units Testing...
        Execution time: 109 seconds
        Performance: [64-bit float types]
                4.913934 GFLOPS
                49011467 special operations (e.g. sin, cos) per second

[2: ALU-64] Final test
        Calculations... Please wait...
        Execution time: 143 seconds
        Performance: [64-bit integer types]
                144660.833758 MIPS

PG:         YOYO: