Radeon + Linux - testy

Zaczęty przez Rysiu, 02 Sierpień 2012, 14:11

Rysiu

Posiada może ktoś jakiegoś Radeona z systemem operacyjnym Linux i zainstalowanym środowiskiem OpenCL?

Pracuję nad pewną app i chciałbym przetestować ją na kartach AMD.

Rysiu

Pusto? Wszyscy się ukryli?  :deadman:

Troll81

na mojm flagowcu z radkiem pracuje win

Rysiu

Nikt nie ma AMD z Linuxem...

A kto ma nVidię z Linuxem?

AL

Cytat: Rysiu w 06 Sierpień 2012, 21:33
Nikt nie ma AMD z Linuxem...

A kto ma nVidię z Linuxem?

Ja mam nv z windą, ale na jednej wirtualce mam linuxa - tyle, że na wirtualce na cudach nie podziałamy.

Martin Fox

Ja mam amd, ale apu - nie wiem czy to cie zadowala :P

Rysiu

Może być APU. Z tego co widzę normalnie to wspiera OpenCL.

RAD-Poland

patrz PW -> konfiguracja słaba ale może do testów się nada  ;)
Athlon64 X2 + Radeon HD 7770

   
WCG:
PG:         YOYO:

     

Rysiu

Konfiguracja bardzo dobra. Wystarczająca w zupełności.

Testy wykonuje do swojej pracy dyplomowej. Program symuluje algorytmy kwantowe. Potrzebuje pomocy przy testowaniu szybkości wykonywania algorytmu Shora.

Jest napisany także prosty benchmark. Testuje on zarówno CPU jak i inne sprzęty (np. GPU).

Link do repozytorium jest tutaj:

http://code.google.com/p/olib/

Możemy wpisać do konsoli:

svn checkout https://olib.googlecode.com/svn/trunk/

W katalogu gdzie będziemy zostanie utworzony folder 'trunk' i zostanie do niego zassane repozyturium.

Programy znajdują się w katalogu 'samples'. TAm też jest plik makefile. W pliku makefile jest dość dużo wersji programów dlatego nie polecam wpisywać 'make' lub 'make all' ponieważ zapewne czegoś w naszym systemie zabraknie. Lista dostępnych progrosów jest w samples/programs.list


Benchmark w wersji na jeden rdzeń można skompilować następująco:

make benchmark_sp

Benchmark w wersji dla OpenCL (karty AMD itp.) można skompilować za pomocą:

make benchmark_ocl

Po uruchomieniu wersji OpenCL powinna pokazać się lista dostępnych urządzeń, które można przetestować.


Sam algorytm Shora na OpenCL (wersja na float) kompilujemy za pomocą:

shor_algorithm_ocl_float

Możemy też skompilować wersję na double:

shor_algorithm_ocl_double

Czasy wykonywania algorytmu Shora sa różne. Na GPU mogą być bardzo niskie.

Najlepiej całość pobrać z repozytorium (przez polecenie) ponieważ tam znajduje się najnowsza rewizja (aktualnie 31). Wieczorem wyślę spakowane archiwum.

Na poczatek prosze o sprawdzenie czy uda się skompilować (wersję na OpenCL i ewentualnie sp).

Mogą pojawić się problemy z przekroczeniem rozmiaru stosu. Dla pewności proponuję wpisać przed uruchomieniem:

ulimit -s unlimited

Zmienia to rozmiar stosu na unlimited. Nie wpływa negatywnie na system. Zmiana zachodzi tylko dla jednej sesji i gdy odpalimy kolejny terminal to znowu trzeba podać ulimit -s unlimited. Nie wiem czy będzie to niezbędne ale pisze dla pewności.


Praca testera powinna ograniczyć się do sprawdzenia czy progros się skompiluje. Uruchomienia skryptu i odesłania wyników. Niczego dodatkowego nie będzie tester musiał robić więc nakład pracy raczej postaram się zredukować.

RAD-Poland

- 12.4 wg CCC
- AMD-APP-SDK-v2.7

wymagana była instalacja:
- systemu zarządzania wersjami "svn" -> pakiet subversion
- kompilatora g++ -> pakiet gcc-c++

benchmark_sp
kompilacja bez problemu


bash-4.2$ ./benchmark_sp
====================================================================
OLib Benchmark.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: Sequential
Compile date: Aug  8 2012
Compile time: 14:36:57
Architecture: x86_64
====================================================================


[1: FPU-32] Initial test
        Calculations... Please wait...
        Execution time: 68 seconds
        Rate: 1.470588

[1: FPU-32] Final test
        Calculations... Please wait...
        FP Units Testing...
        Execution time: 100 seconds
        Special Units Testing...
        Execution time: 57 seconds
        Performance: [32-bit float types]
                0.262633 GFLOPS
                1236842 special operations (e.g. sin, cos) per second

[1: FPU-64] Final test
        Calculations... Please wait...
        FP Units Testing...
        Execution time: 116 seconds
        Special Units Testing...
        Execution time: 51 seconds
        Performance: [64-bit float types]
                0.021677 GFLOPS
                132352 special operations (e.g. sin, cos) per second

[2: ALU-64] Final test
        Calculations... Please wait...
        Execution time: 30 seconds
        Performance: [64-bit integer types]
                6824.400009 MIPS


benchmark_ocl

problemy przy kompilacji, nie znajduje CL/cl.h
pewnie powinno być globalne przypicsanie CL/ do /opt/AMDAPP/include
ale po drobnej zmianie w "makefile" (najczęsciej stosowana metoda) dodanie przy kompilacji -I/opt/AMDAPP/include kompilacja przeszła, choć jest trochę warning'ów


bash-4.2$ make benchmark_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include benchmark.cpp -o benchmark_core_ocl.o -D USE_OPENCL
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../primitives/olib_primitives_ocl.cpp
../primitives/olib_primitives_ocl.cpp: In function 'int support_fp64(int)':
../primitives/olib_primitives_ocl.cpp:79:84: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../primitives/olib_primitives_ocl.cpp: In function 'int search_devices(_cl_device_id**, cl_uint*)':
../primitives/olib_primitives_ocl.cpp:102:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
../primitives/olib_primitives_ocl.cpp:131:19: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
../primitives/olib_primitives_ocl.cpp: In function 'char* olib_read_opencl_kernel(char*)':
../primitives/olib_primitives_ocl.cpp:235:21: warning: variable 'readlen' set but not used [-Wunused-but-set-variable]

../primitives/olib_primitives_ocl.cpp:136: note: not vectorized: control flow in loop.
../primitives/olib_primitives_ocl.cpp:105: note: not vectorized: control flow in loop.
../primitives/olib_primitives_ocl.cpp:94: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../sp/olib_strings_sp.cpp

../sp/olib_strings_sp.cpp:71: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:71: note: not vectorized: Bad inner loop.
../sp/olib_strings_sp.cpp:74: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:48: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:48: note: not vectorized: Bad inner loop.
../sp/olib_strings_sp.cpp:52: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:35: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../opencl/olib_benchmark_ocl.cpp -lOpenCL
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_fp_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:41:110: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_fp_units_ocl(double*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:173:111: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_special_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:306:115: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_special_units_ocl(double*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:439:116: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_mips_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:575:106: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
g++ -Wall benchmark_core_ocl.o olib_primitives_ocl.o olib_strings_sp.o olib_benchmark_ocl.o -o benchmark_ocl -lOpenCL


z testem trochę gorzej (opisowo)
test FPU32 bit:
- po kilku sekundach zamrożenie systemu, brak reakcji na klawiaturę, mysz, brak odświeżania ekranu
- po kilkudziesięciu sekundach zakończenie testu przebudzenie, wyświetlenie wyników
- na kilka sekund wszystko OK, do uruchomienia kolejnego testu
test FPU64 bit:
- natychmiastowe zamrożenie systemu i .... :(
- po poborze mocy wniosek, że test przebiega
- po ok 7 minutach spada pobór mocy do poziomu pracy  GPU z wysokimi zegarami, ale nieobciążonej karty
- system nie reaguje, dysk pracuje, procesy w tle działają, X serwer nie wstaje
- restart komputera

może powód moich zmian bez zagłębiania się kod lub braku cierpliwości na dokończenie testu, która skończyła się po ~20min ;)

test przeprowadzony z wersją 31
obecnie widzę nową wersję 32, ale do czasu Twojej opinii/ sugestii wstrzymuję się od kolejnego testu

EDIT: nie wykonałem kolejnych czynności (kompilacji algorytmu shor'a), ani zmiany stosu może w tym tkwił problem
wykonywałem czynności w sugerowanej wyżej kolejności  :P

   
WCG:
PG:         YOYO:

     

Rysiu

Wersja 32 niczego nie wnosi do problemów, o których wspomniałeś.

Ciekawi mnie jak benchmark wykrył urządzenie OpenCL? Przed uruchomieniem testu miałeś tam wyświetlone informacje o device. Możesz je podesłać bez uruchamiania samego testu (tylko uruchom benchmark, skopiuj informacje i nie zatwierdzaj aby liczył dalej).

RAD-Poland

owszem jest przy uruchomieniu  ./benchmark_ocl
ale nie mogłem tego zacytować, gdyż już nie miałem możliwości przechwycić konsoli

bash-4.2$ ./benchmark_ocl
====================================================================
OLib Benchmark.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: OpenCL
Compile date: Aug  8 2012
Compile time: 17:54:02
Architecture: x86_64
====================================================================

--- 1 OpenCL platform(s) found: -----------------------------------------
  -- 0 --
  PROFILE = FULL_PROFILE
  VERSION = OpenCL 1.2 AMD-APP (938.1)
  NAME = AMD Accelerated Parallel Processing
  VENDOR = Advanced Micro Devices, Inc.
  EXTENSIONS = cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
--- 1 OpenCL device(s) found on platform: -----------------------------------------

  ------------------- 0 -------------------
  DEVICE_NAME = Capeverde
  DEVICE_VENDOR = Advanced Micro Devices, Inc.
  DEVICE_VERSION = OpenCL 1.2 AMD-APP (938.1)
  DRIVER_VERSION = CAL 1.4.1741 (VM)
  CL_DEVICE_EXTENSIONS = cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt
  DEVICE_MAX_COMPUTE_UNITS = 8
  DEVICE_MAX_CLOCK_FREQUENCY = 1000
  DEVICE_GLOBAL_MEM_SIZE = 843055104
  CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 16384
  CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 64
  CL_DEVICE_LOCAL_MEM_SIZE = 32768
  CL_DEVICE_MAX_MEM_ALLOC_SIZE = 524288000
  CL_DEVICE_MAX_MEM_ALLOC_SIZE = 524288000

Enter the device number:
>>


EDIT: wybierałem "0"

   
WCG:
PG:         YOYO:

     

Rysiu

Problemem może być interfejs graficzny.

Dasz radę wyłączyć interfejs graficzny i odpalić program mając w systemie tylko tryb tekstowy?

Ja do wieczora postaram się poprawić benchamark i napiszę na forum jak będzie nowa rewizja.

PS. Faktycznie przetestowałem u siebie na nVidia Tesla C2050 i dostałem:

[1: FPU-32] Initial test
Calculations... Please wait...
Execution time: 19 seconds
Rate: 5.263158

[1: FPU-32] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 84 seconds
Special Units Testing...
Execution time: 127 seconds
Performance: [32-bit float types]
573.638161 GFLOPS
1018482283 special operations (e.g. sin, cos) per second

[1: FPU-64] Final test
Calculations... Please wait...
FP Units Testing...
ERROR: CL_INVALID_BINARY Execution time: 438 seconds
Special Units Testing...
ERROR: CL_INVALID_BINARY Execution time: 25 seconds
Performance: [64-bit float types]
11.001088 GFLOPS
517380000 special operations (e.g. sin, cos) per second

[2: ALU-64] Final test
Calculations... Please wait...
Execution time: 96 seconds
Performance: [64-bit integer types]
260850.292889 MIPS


Na GeForce GTS250 gdzie nie ma double precision działa dobrze:

[1: FPU-32] Initial test
Calculations... Please wait...
Execution time: 105 seconds
Rate: 0.952381

[1: FPU-32] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 106 seconds
Special Units Testing...
Execution time: 51 seconds
Performance: [32-bit float types]
82.255991 GFLOPS
458926470 special operations (e.g. sin, cos) per second

[1: FPU-64] Final test
FP-64 NOT SUPPORTED

[2: ALU-64] Final test
Calculations... Please wait...
Execution time: 30 seconds
Performance: [64-bit integer types]
151041.883120 MIPS


Jest jakiś ostry wałek w double. Zaraz będę szukał. Może pamiętasz jaką wydajność w FPU-32 pokazał Ci test?

Możesz też spróbowac do tego czasu skompilować i odpalić shor_algorithm_ocl_float (działa na 32-bit float więc jest szansa, że będzie ok)

On powinien wykonywać się szybko. Zwraca na ekran kilka liczb. Jeżeli się uruchomi i zadziała to wykonaj jego kilka razy (ma różne czasy wykonywania i zwraca różne wyniki).

RAD-Poland

#13
wyłączenie interfejsu graficznego powoduje brak dostępu do urządzenia  :fright:
Cytat--- 1 OpenCL platform(s) found: -----------------------------------------
  -- 0 --
  PROFILE = FULL_PROFILE
  VERSION = OpenCL 1.2 AMD-APP (938.1)
  NAME = AMD Accelerated Parallel Processing
  VENDOR = Advanced Micro Devices, Inc.
  EXTENSIONS = cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
OpenCL Error 'clGetDeviceIDs (platforms[0],CL_DEVICE_TYPE_GPU,100,devices,devices_n)' returned-1!
Przerwane
GPU znajduje tylko po podaniu polecenia "startx"  czyli pod interfejsem graficznym na 8 terminalu na pozostałych 1-6 w/w stan
natomiast przy wyłączonych X brak dostępu do 7 i 8 terminalu
(? przypadłość sterowników lub systemu Mandriv'y ?)



CytatMoże pamiętasz jaką wydajność w FPU-32 pokazał Ci test?
Cytat[1: FPU-32] Initial test
   Calculations... Please wait...
   Execution time: 21 seconds
   Rate: 4,761905

[1: FPU-32] Final test
   Calculations... Please wait...
   FP Units Testing...
   Execution time: 103 seconds
        Special Units Testing...
/tmp/OCL3yOly3.cl; line 42:error: more than one instance of overloaded
function "pow" matches the argument list:
   function "pow(float, float) C++"
   function "pow(double, double) C++"
   argument types are: (float, double)
tmp -=pow(tmp, -.5);  //4ops

Internal error: clc compiler invocation failed.
ERROR ......CL_INVALID_KERNEL ...... (nieczytelne zdjęcie)

   Performance: [32-bit float types]
      423.267122 GFLOPS
      117028500000 special operations (e.g. sin, cos) per second
dane pogrubione naniesione ręcznie możliwe literówki
zwróć uwagę na błędy przy benchmarku 32bit, których nie masz na NVIDIi

EDIT:
przy teście nVidia Tesla C2050 masz
CytatERROR: CL_INVALID_BINARY   Execution time: 438 seconds

Radeon HD 7770 ma dość słabe DP może za krótko czekałem,
czas testu po poborze mocy kreśliłem podobnie ~7 min (420sek)
ale w sumie jeszcze kilkanaście minut gdy był zmniejszony pobór mocy przez GPU

   
WCG:
PG:         YOYO:

     

Rysiu

#14
O kurcze faktycznie tak jest problem z przeciążoną funkcją. Zaraz to poprawię.

A próbowałeś coś z tym Shor'em na float?

Cytat: RAD-Poland w 08 Sierpień 2012, 20:51
przy teście nVidia Tesla C2050 masz
CytatERROR: CL_INVALID_BINARY   Execution time: 438 seconds
Szukałem rozwiązania dlaczego tak u mnie jest. Okazało się, że najprawdopodobniej jest to spowodowane starą wersją OpenCL.

Ja mam jedynie:

--- 1 OpenCL platform(s) found: -----------------------------------------
  -- 0 --
  PROFILE = FULL_PROFILE
  VERSION = OpenCL 1.0 CUDA 3.2.1
  NAME = NVIDIA CUDA
  VENDOR = NVIDIA Corporation
  EXTENSIONS = cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll


W ciągu 30 minut dodam do repozytorium nową rewizję z dodatkową pragmą dla AMD to zobaczymy czy to coś zmieni.

RAD-Poland

#15
- w/w testy dla v32
- stery 12.4 wg CCC instalowane z amd-driver-installer-12-6-x86.x86_64.run
- tyle restartów tzn 5 nie miałem przez ostatnie 2 lata  (była okazja na odkurzenie budy)  ;)

CytatA próbowałeś coś z tym Shor'em na float?
jeszcze nie,
shor_algorithm_ocl_float
kompiluje się, -> długa nie wklejam tych kilometrów
po kompilacji
bash-4.2$ ./shor_algorithm_ocl_float
Naruszenie ochrony pamięci


shor_algorithm_ocl_double
bash-4.2$ make shor_algorithm_ocl_double
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include shor_algorithm.cpp -o shor_algorithm_ocl_core_double.o -D USE_OPENCL -D DOUBLE_PRECISION
shor_algorithm.cpp: In function 'int main(int, char**)':
shor_algorithm.cpp:502:53: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
shor_algorithm.cpp:617:164: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:1091:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
shor_algorithm.cpp:1206:173: warning: format '%lld' expects argument of type 'long long int', but argument 6 has type 'double' [-Wformat]
shor_algorithm.cpp:1206:173: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:1210:162: warning: too many arguments for format [-Wformat-extra-args]

shor_algorithm.cpp:698: note: not vectorized: control flow in loop.
shor_algorithm.cpp:663: note: not vectorized: control flow in loop.
shor_algorithm.cpp:502: note: not vectorized: control flow in loop.
shor_algorithm.cpp:489: note: not vectorized: control flow in loop.
shor_algorithm.cpp:436: note: not vectorized: control flow in loop.
shor_algorithm.cpp:368: note: not vectorized: control flow in loop.
shor_algorithm.cpp:214: note: vectorized 0 loops in function.
shor_algorithm.cpp:1140:3: warning: 'factor' may be used uninitialized in this function [-Wuninitialized]
g++ -Wall shor_algorithm_ocl_core_double.o olib_primitives.o olib_linear_algebra_sp.o olib_quantum_sp.o olib_quantum_ocl.o olib_discrete_sp.o olib_primitives_ocl.o olib_strings_sp.o -o shor_algorithm_ocl_double -lOpenCL

bash-4.2$ ./shor_algorithm_ocl_double
Naruszenie ochrony pamięci

:(

   
WCG:
PG:         YOYO:

     

Rysiu

Cytat: RAD-Poland w 08 Sierpień 2012, 21:11
- jak system wytrzyma to zaraz wkleję double....................
:deadman:

bash-4.2$ ./shor_algorithm_ocl_float
Naruszenie ochrony pamięci

Odpowiedz na to:

ulimit -s unlimited

RAD-Poland

bash-4.2$ ulimit -s unlimited
bash-4.2$ ./shor_algorithm_ocl_float
0 1 6 1.98 1.19 0

:)
bash-4.2$ ulimit -s unlimited
bash-4.2$ ./shor_algorithm_ocl_double
"/tmp/OCLLqoikk.cl", line 28: warning: ignore unrecognized OpenCL extension
  #pragma OPENCL EXTENSION cl_khr_fp64 : enable
                                               ^

"/tmp/OCLLqoikk.cl", line 29: error: can't enable all OpenCL extensions or
          unrecognized OpenCL extension
  #pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
                                                       ^

"/tmp/OCLLqoikk.cl", line 46: error: function "atom_cmpxchg" declared
          implicitly
        while ((readback = atom_cmpxchg((__global long*)address, oldval, newval)) != oldval)
                           ^

2 errors detected in the compilation of "/tmp/OCLLqoikk.cl".

Internal error: clc compiler invocation failed.

ERROR: CL_BUILD_PROGRAM_FAILUREERROR: CL_INVALID_KERNEL0 1 1 1.57 0.86 0

:(

EDIT: zwątpiłem czy 7770 ma DP ale puściłem MilyWay i liczy

   
WCG:
PG:         YOYO:

     

Rysiu

Cytat: RAD-Poland w 08 Sierpień 2012, 21:24
bash-4.2$ ulimit -s unlimited
bash-4.2$ ./shor_algorithm_ocl_float
0 1 6 1.98 1.19 0

:)
I gites. Tak powinno być. Jeden wynik jednak niczego mi nie mówi. Jutro podeślę skrypcik w bashu co będzie w pętli to liczył i zbierał wyniki.


Cytat: RAD-Poland w 08 Sierpień 2012, 21:24
bash-4.2$ ulimit -s unlimited
bash-4.2$ ./shor_algorithm_ocl_double
"/tmp/OCLLqoikk.cl", line 28: warning: ignore unrecognized OpenCL extension
  #pragma OPENCL EXTENSION cl_khr_fp64 : enable
                                               ^

"/tmp/OCLLqoikk.cl", line 29: error: can't enable all OpenCL extensions or
          unrecognized OpenCL extension
  #pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
                                                       ^

"/tmp/OCLLqoikk.cl", line 46: error: function "atom_cmpxchg" declared
          implicitly
        while ((readback = atom_cmpxchg((__global long*)address, oldval, newval)) != oldval)
                           ^

2 errors detected in the compilation of "/tmp/OCLLqoikk.cl".

Internal error: clc compiler invocation failed.

ERROR: CL_BUILD_PROGRAM_FAILUREERROR: CL_INVALID_KERNEL0 1 1 1.57 0.86 0

:(

No tak... Można było się domyślić...

W repozytorium jest rewizja 33.

Na poczatek przetestuj Shora na double. Potem zapodaj benchmarka. Zobaczymy czy teraz łyknie.

RAD-Poland

- test v33
- dodanie -I/opt/AMDAPP/include
bash-4.2$ ulimit -s unlimited
bash-4.2$ ./shor_algorithm_ocl_float
0 1 4 1.92 1.15 0

:)
bash-4.2$ ./shor_algorithm_ocl_double
"/tmp/OCLHLI142.cl", line 29: warning: ignore unrecognized OpenCL extension
      #pragma OPENCL EXTENSION cl_khr_fp64 : enable
                                                   ^
"/tmp/OCLHLI142.cl", line 36: error: can't enable all OpenCL extensions or
          unrecognized OpenCL extension
  #pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
                                                       ^
"/tmp/OCLHLI142.cl", line 53: error: function "atom_cmpxchg" declared
          implicitly
        while ((readback = atom_cmpxchg((__global long*)address, oldval, newval)) != oldval)
                           ^
2 errors detected in the compilation of "/tmp/OCLHLI142.cl".

Internal error: clc compiler invocation failed.

ERROR: CL_BUILD_PROGRAM_FAILUREERROR: CL_INVALID_KERNEL0 1 2 1.58 0.87 0

niestety  :(

EDIT benchmarka nie puszczałem -> jesteś pewien ?

   
WCG:
PG:         YOYO:

     

Rysiu

A benchmark?

Cytat: RAD-Poland w 08 Sierpień 2012, 21:42
EDIT benchmarka nie puszczałem -> jesteś pewien ?
Nie  :deadman:

Ale tylko w jeden sposób można się przekonać. Jednak zapodaj go jak nie będziesz niczego na PC robić  :attack:

RAD-Poland

#21
- PC zawsze coś robi, skrypty php itd
- BOINCa wyłączam CPU i GPU
- kompilacja
bash-4.2$ make benchmark_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include benchmark.cpp -o benchmark_core_ocl.o -D USE_OPENCL
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../opencl/olib_benchmark_ocl.cpp -lOpenCL
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_fp_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:41:110: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_fp_units_ocl(double*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:173:111: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_special_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:306:115: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_special_units_ocl(double*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:439:116: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_mips_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:575:106: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
g++ -Wall benchmark_core_ocl.o olib_primitives_ocl.o olib_strings_sp.o olib_benchmark_ocl.o -o benchmark_ocl -lOpenCL


- najwyżej będzie kolejny restart  :cold:

EDIT: niestety

32bit initial 21sek/4,761905
32bit final 103sek/151sek/423.267122/775023178
64bit final  :cold: od 20 min -> OT: idę go zrestartować bo nie będzie statów PM (już są braki w Rosettce, a zachwilę SETI i SETIbeta będzie mulił)

EDIT: pomału do przodu, widać poprawę w drugim teście FPU32
Cytat[1: FPU-32] Initial test
   Calculations... Please wait...
   Execution time: 21 seconds
   Rate: 4,761905

[1: FPU-32] Final test
   Calculations... Please wait...
   FP Units Testing...
   Execution time: 103 seconds
   Special Units Testing...
   Execution time: 151 seconds
   Performance: [32-bit float types]
      423.267122 GFLOPS
      775023178 special operations (e.g. sin, cos) per second

   
WCG:
PG:         YOYO:

     

Rysiu

Jutro stworzę skrypcik do testowania. Ile czasu GPU dasz radę wyłuskać?  :attack:

RAD-Poland

postaram się trochę czasu zarezerwować 1-2h
daj znać jak będziesz miał nową wersję (będę zerkać na RSS/Forum)
skrypt by się przydał i jeśli była by możliwość w aplikacji stworzenie pliku wyjściowego, bardzo przydatny przy zawieszeniu
przechwytywanie konsoli coś mi nie wyszło, przy zawieszeniu pusty plik, a tak byłoby łatwiej i szybciej przekazać informacje, niż przy przepisywaniu
często benchmarki OC tworzą takie pliki i wiadomo w którym miejscu i z jakiego powodu komp zaliczył pada

   
WCG:
PG:         YOYO:

     

Rysiu

Tak. Bedzie tworzył pliki tymczasowe. Nie powinien jednak zaliczyć zwiechy - ograniczymy się do typu float, a na nim wszystko ładnie działa  :attack:

Martin Fox

fox@APU:~/test/trunk/samples$ make benchmark_sp
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona benchmark.cpp -o benchmark_core_sp.o -D USE_SEQUENTIAL
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../sp/olib_benchmark_sp.cpp -o olib_benchmark_sp.o
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_fp_units_sp(float*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:46:9: warning: variable 'a' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp:49:9: warning: variable 'b' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_fp_units_sp(double*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:530:9: warning: variable 'a' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp:533:9: warning: variable 'b' set but not used [-Wunused-but-set-variable]

../sp/olib_benchmark_sp.cpp:38: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:52: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:33: note: vectorized 0 loops in function.

../sp/olib_benchmark_sp.cpp:522: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:536: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:517: note: vectorized 0 loops in function.

../sp/olib_benchmark_sp.cpp:1007: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1014: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1001: note: vectorized 0 loops in function.
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_special_units_sp(float*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:1017:18: warning: 'tmp' may be used uninitialized in this function [-Wuninitialized]

../sp/olib_benchmark_sp.cpp:1037: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1044: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1031: note: vectorized 0 loops in function.
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_special_units_sp(double*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:1047:19: warning: 'tmp' may be used uninitialized in this function [-Wuninitialized]

../sp/olib_benchmark_sp.cpp:1072: note: not vectorized: multiple nested loops.
../sp/olib_benchmark_sp.cpp:1075: note: not vectorized: number of iterations cannot be computed.
../sp/olib_benchmark_sp.cpp:1075: note: not vectorized: Bad inner loop.
../sp/olib_benchmark_sp.cpp:1079: note: not vectorized: number of iterations cannot be computed.
../sp/olib_benchmark_sp.cpp:1061: note: vectorized 0 loops in function.
g++ -Wall benchmark_core_sp.o olib_benchmark_sp.o -o benchmark_sp

fox@APU:~/test/trunk/samples$ make benchmark_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona benchmark.cpp -o benchmark_core_ocl.o -D USE_OPENCL
benchmark.cpp:69:21: fatal error: CL/cl.h: No such file or directory
compilation terminated.
make: *** [benchmark_core_ocl.o] Error 1


fox@APU:~/test/trunk/samples$ ./benchmark_sp
====================================================================
OLib Benchmark.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: Sequential
Compile date: Aug  9 2012
Compile time: 09:52:49
Architecture: x86_64
====================================================================


[1: FPU-32] Initial test
Calculations... Please wait...
Execution time: 46 seconds
Rate: 2.173913

[1: FPU-32] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 101 seconds
Special Units Testing...
Execution time: 49 seconds
Performance: [32-bit float types]
0.384516 GFLOPS
2127551 special operations (e.g. sin, cos) per second

[1: FPU-64] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 118 seconds
Special Units Testing...
Execution time: 41 seconds
Performance: [64-bit float types]
0.030781 GFLOPS
237804 special operations (e.g. sin, cos) per second

[2: ALU-64] Final test
Calculations... Please wait...
Execution time: 31 seconds
Performance: [64-bit integer types]
9765.870981 MIPS

Rysiu

Może na początek skompiluj shor_algorithm_ocl_float

Do linijki numer 198 w pliku makefile dodaj po spacji -I/opt/AMDAPP/include

i potem normalnie

make shor_algorithm_ocl_float

Rysiu

Można także pobrać rewizję 34 w której problem kompilacji na AMD chyba jest poprawiony.

RAD-Poland

v34 :no:

bash-4.2$ make benchmark_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona benchmark.cpp -o benchmark_core_ocl.o -D USE_OPENCL
benchmark.cpp:69:21: fatal error: CL/cl.h: Nie ma takiego pliku ani katalogu
compilation terminated.


dla:
make shor_algorithm_ocl_double
make shor_algorithm_ocl_float
...
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../opencl/olib_quantum_ocl.cpp -lOpenCL
../opencl/olib_quantum_ocl.cpp:31:20: fatal error: CL/cl.h: Nie ma takiego pliku ani katalogu
compilation terminated.


   
WCG:
PG:         YOYO:

     

Rysiu

Eh. Właśnie widzę, że make źle poprawiłem. Kaszanę niezłą zrobiłem. Potem to poprawię (za kilka godzin jak wrócę) - teraz już nie dam rady.

W repozytorium jest już plik do testów 'shor_algorithm_tester.sh' ale wkradł się tam drobny problem, którego w repo poprawionego nie ma.

Odpaliłem testy u siebie na Tesli C2050 aby wiedzieć ile mogą się one na oko wykonywać na innych kartach.

Pod wieczór napiszę dokładniej co z tego wychodzi.

Rysiu

A teraz komplacja przechodzi? (Rewizja 36)

RAD-Poland

v36

make shor_algorithm_ocl_double OK
make shor_algorithm_ocl_float OK

make benchmark_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona benchmark.cpp -o benchmark_core_ocl.o -D USE_OPENCL
benchmark.cpp:69:21: fatal error: CL/cl.h: Nie ma takiego pliku ani katalogu
compilation terminated.

   
WCG:
PG:         YOYO:

     

Rysiu

Rew. 37.

Można sprawdzić:

benchmark_ocl
burner_ocl (może nie uruchamiać)

RAD-Poland

#33
v37

make shor_algorithm_ocl_double OK
make shor_algorithm_ocl_float OK
make benchmark_ocl OK

make burner_ocl OK
bash-4.2$ make burner_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include burner.cpp -o burner_core_ocl.o -D USE_OPENCL
burner.cpp:226: note: not vectorized: multiple exits.
burner.cpp:118: note: vectorized 0 loops in function.
g++ -Wall burner_core_ocl.o olib_primitives_ocl.o olib_strings_sp.o olib_benchmark_ocl.o -o burner_ocl -lOpenCL


EDIT: ??
Cytatburner_ocl (może nie uruchamiać)

? - może się nie uruchamiać
? - może lepiej nie uruchamiać bo coś ...

bash-4.2$ ./burner_ocl
====================================================================
OLib Burner.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: OpenCL
Compile date: Aug  9 2012
Compile time: 18:50:51
Architecture: x86_64
====================================================================

--- 1 OpenCL platform(s) found: -----------------------------------------
  -- 0 --
  PROFILE = FULL_PROFILE
  VERSION = OpenCL 1.2 AMD-APP (938.1)
  NAME = AMD Accelerated Parallel Processing
  VENDOR = Advanced Micro Devices, Inc.
  EXTENSIONS = cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
--- 1 OpenCL device(s) found on platform: -----------------------------------------

  ------------------- 0 -------------------
  DEVICE_NAME = Capeverde
  DEVICE_VENDOR = Advanced Micro Devices, Inc.
  DEVICE_VERSION = OpenCL 1.2 AMD-APP (938.1)
  DRIVER_VERSION = CAL 1.4.1741 (VM)
  CL_DEVICE_EXTENSIONS = cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt
  DEVICE_MAX_COMPUTE_UNITS = 8
  DEVICE_MAX_CLOCK_FREQUENCY = 1000
  DEVICE_GLOBAL_MEM_SIZE = 872415232
  CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 16384
  CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 64
  CL_DEVICE_LOCAL_MEM_SIZE = 32768
  CL_DEVICE_MAX_MEM_ALLOC_SIZE = 524288000
  CL_DEVICE_MAX_MEM_ALLOC_SIZE = 524288000

Enter the device number:
>>


EDIT2:
wykonać ./burner_ocl   ?

   
WCG:
PG:         YOYO:

     

Rysiu

Cytat: RAD-Poland w 09 Sierpień 2012, 18:49
? - może lepiej nie uruchamiać bo coś ...
Może nie oddać sterowania do podsystemu graficznego  :attack: Nie ma po co ryzykować stabilności jak nie trzeba.

Martin Fox

#35
rev 39
fox@APU:~/test/trunk/samples$ make benchmark_sp
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona benchmark.cpp -o benchmark_core_sp.o -D USE_SEQUENTIAL
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../sp/olib_benchmark_sp.cpp -o olib_benchmark_sp.o
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_fp_units_sp(float*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:46:9: warning: variable 'a' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp:49:9: warning: variable 'b' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_fp_units_sp(double*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:530:9: warning: variable 'a' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp:533:9: warning: variable 'b' set but not used [-Wunused-but-set-variable]

../sp/olib_benchmark_sp.cpp:38: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:52: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:33: note: vectorized 0 loops in function.

../sp/olib_benchmark_sp.cpp:522: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:536: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:517: note: vectorized 0 loops in function.

../sp/olib_benchmark_sp.cpp:1007: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1014: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1001: note: vectorized 0 loops in function.
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_special_units_sp(float*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:1017:18: warning: 'tmp' may be used uninitialized in this function [-Wuninitialized]

../sp/olib_benchmark_sp.cpp:1037: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1044: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1031: note: vectorized 0 loops in function.
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_special_units_sp(double*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:1047:19: warning: 'tmp' may be used uninitialized in this function [-Wuninitialized]

../sp/olib_benchmark_sp.cpp:1072: note: not vectorized: multiple nested loops.
../sp/olib_benchmark_sp.cpp:1075: note: not vectorized: number of iterations cannot be computed.
../sp/olib_benchmark_sp.cpp:1075: note: not vectorized: Bad inner loop.
../sp/olib_benchmark_sp.cpp:1079: note: not vectorized: number of iterations cannot be computed.
../sp/olib_benchmark_sp.cpp:1061: note: vectorized 0 loops in function.
g++ -Wall benchmark_core_sp.o olib_benchmark_sp.o -o benchmark_sp

fox@APU:~/test/trunk/samples$ make benchmark_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include benchmark.cpp -o benchmark_core_ocl.o -D USE_OPENCL
benchmark.cpp: In function 'int main(int, char**)':
benchmark.cpp:215:26: warning: ignoring return value of 'int scanf(const char*, ...)', declared with attribute warn_unused_result [-Wunused-result]
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../primitives/olib_primitives_ocl.cpp
../primitives/olib_primitives_ocl.cpp: In function 'int support_fp64(int)':
../primitives/olib_primitives_ocl.cpp:79:84: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../primitives/olib_primitives_ocl.cpp: In function 'int search_devices(_cl_device_id**, cl_uint*)':
../primitives/olib_primitives_ocl.cpp:102:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
../primitives/olib_primitives_ocl.cpp:131:19: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
../primitives/olib_primitives_ocl.cpp: In function 'char* olib_read_opencl_kernel(char*)':
../primitives/olib_primitives_ocl.cpp:238:21: warning: variable 'readlen' set but not used [-Wunused-but-set-variable]

/usr/include/x86_64-linux-gnu/bits/stdio2.h:105: note: not vectorized: control flow in loop.
/usr/include/x86_64-linux-gnu/bits/stdio2.h:105: note: not vectorized: control flow in loop.
../primitives/olib_primitives_ocl.cpp:94: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../sp/olib_strings_sp.cpp

../sp/olib_strings_sp.cpp:71: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:71: note: not vectorized: Bad inner loop.
../sp/olib_strings_sp.cpp:74: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:48: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:48: note: not vectorized: Bad inner loop.
../sp/olib_strings_sp.cpp:52: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:35: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../opencl/olib_benchmark_ocl.cpp
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_fp_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:41:110: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_fp_units_ocl(double*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:173:111: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_special_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:306:115: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_special_units_ocl(double*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:439:116: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_mips_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:575:106: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
g++ -Wall benchmark_core_ocl.o olib_primitives_ocl.o olib_strings_sp.o olib_benchmark_ocl.o -o benchmark_ocl -lOpenCL

fox@APU:~/test/trunk/samples$ make burner_sp
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona burner.cpp -o burner_core_sp.o -D USE_SEQUENTIAL

/usr/include/x86_64-linux-gnu/bits/stdio2.h:105: note: not vectorized: multiple exits.
burner.cpp:118: note: vectorized 0 loops in function.
g++ -Wall burner_core_sp.o olib_benchmark_sp.o -o burner_sp
fox@APU:~/test/trunk/samples$ make burner_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include burner.cpp -o burner_core_ocl.o -D USE_OPENCL
burner.cpp: In function 'int main(int, char**)':
burner.cpp:171:26: warning: ignoring return value of 'int scanf(const char*, ...)', declared with attribute warn_unused_result [-Wunused-result]

/usr/include/x86_64-linux-gnu/bits/stdio2.h:105: note: not vectorized: multiple exits.
burner.cpp:118: note: vectorized 0 loops in function.
g++ -Wall burner_core_ocl.o olib_primitives_ocl.o olib_strings_sp.o olib_benchmark_ocl.o -o burner_ocl -lOpenCL

fox@APU:~/test/trunk/samples$ make shor_algorithm_ocl_float
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona shor_algorithm.cpp -o shor_algorithm_ocl_core_float.o -D USE_OPENCL -D FLOAT_PRECISION
shor_algorithm.cpp: In function 'int main(int, char**)':
shor_algorithm.cpp:518:53: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
shor_algorithm.cpp:633:164: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:1107:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
shor_algorithm.cpp:1222:173: warning: format '%lld' expects argument of type 'long long int', but argument 6 has type 'double' [-Wformat]
shor_algorithm.cpp:1222:173: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:1226:162: warning: too many arguments for format [-Wformat-extra-args]

shor_algorithm.cpp:714: note: Detected interleaving *collapse.9_140[i_2].imag and *collapse.9_140[i_2].real
shor_algorithm.cpp:714: note: Vectorizing an unaligned access.
shor_algorithm.cpp:714: note: Vectorizing an unaligned access.
shor_algorithm.cpp:714: note: vect_model_store_cost: unaligned supported by hardware.
shor_algorithm.cpp:714: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
shor_algorithm.cpp:714: note: vect_model_load_cost: unaligned supported by hardware.
shor_algorithm.cpp:714: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
shor_algorithm.cpp:714: note: not vectorized: relevant stmt not supported: cstore.35_400 = [cond_expr] value_242 == D.25145_245 ? 1.0e+0 : 0.0;

shor_algorithm.cpp:679: note: not vectorized: control flow in loop.
shor_algorithm.cpp:518: note: not vectorized: control flow in loop.
shor_algorithm.cpp:505: note: not vectorized: control flow in loop.
shor_algorithm.cpp:452: note: not vectorized: control flow in loop.
shor_algorithm.cpp:384: note: not vectorized: control flow in loop.
shor_algorithm.cpp:214: note: vectorized 0 loops in function.
shor_algorithm.cpp:1107:3: warning: 'factor' may be used uninitialized in this function [-Wuninitialized]
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../primitives/olib_primitives.cpp

../primitives/olib_primitives.cpp:266: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../primitives/olib_primitives.cpp:261: note: vectorized 0 loops in function.

../primitives/olib_primitives.cpp:282: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../primitives/olib_primitives.cpp:285: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../primitives/olib_primitives.cpp:276: note: vectorized 0 loops in function.

../primitives/olib_primitives.cpp:305: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../primitives/olib_primitives.cpp:308: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../primitives/olib_primitives.cpp:299: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../sp/olib_linear_algebra_sp.cpp

../sp/olib_linear_algebra_sp.cpp:63: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:63: note: dependence distance == 0 between D.25881_10->imag and D.25881_10->imag
../sp/olib_linear_algebra_sp.cpp:63: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:63: note: accesses have the same alignment.
../sp/olib_linear_algebra_sp.cpp:63: note: dependence distance modulo vf == 0 between D.25881_10->imag and D.25881_10->imag
../sp/olib_linear_algebra_sp.cpp:63: note: Detected single element interleaving D.25881_10->imag step 8
../sp/olib_linear_algebra_sp.cpp:63: note: not vectorized: complicated access pattern.
../sp/olib_linear_algebra_sp.cpp:56: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:77: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:77: note: dependence distance == 0 between D.25872_10->imag and D.25872_10->imag
../sp/olib_linear_algebra_sp.cpp:77: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:77: note: accesses have the same alignment.
../sp/olib_linear_algebra_sp.cpp:77: note: dependence distance modulo vf == 0 between D.25872_10->imag and D.25872_10->imag
../sp/olib_linear_algebra_sp.cpp:77: note: Detected single element interleaving D.25872_10->imag step 16
../sp/olib_linear_algebra_sp.cpp:77: note: not vectorized: complicated access pattern.
../sp/olib_linear_algebra_sp.cpp:70: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:91: note: not vectorized: no vectype for stmt: D.25864_11 = D.25863_10->imag;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:84: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:129: note: not vectorized: inner-loop count not invariant.
../sp/olib_linear_algebra_sp.cpp:131: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:119: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:152: note: not vectorized: inner-loop count not invariant.
../sp/olib_linear_algebra_sp.cpp:154: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:142: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:175: note: not vectorized: inner-loop count not invariant.
../sp/olib_linear_algebra_sp.cpp:177: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:165: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:225: note: not vectorized: data ref analysis failed D.25790_18->real = 1.0e+0;

../sp/olib_linear_algebra_sp.cpp:216: note: Detected interleaving D.25782_9->real and D.25782_9->imag
../sp/olib_linear_algebra_sp.cpp:216: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:216: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:216: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../sp/olib_linear_algebra_sp.cpp:216: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:216: note: Cost model analysis:
  Vector inside of loop cost: 2
  Vector outside of loop cost: 9
  Scalar iteration cost: 2
  Scalar outside cost: 6
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 3

../sp/olib_linear_algebra_sp.cpp:216: note:   Profitability threshold = 2

../sp/olib_linear_algebra_sp.cpp:216: note: Profitability threshold is 2 loop iterations.
../sp/olib_linear_algebra_sp.cpp:216: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:216: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:209: note: vectorized 1 loops in function.

../sp/olib_linear_algebra_sp.cpp:248: note: not vectorized: data ref analysis failed D.25775_18->real = 1.0e+0;

../sp/olib_linear_algebra_sp.cpp:239: note: Detected interleaving D.25767_9->real and D.25767_9->imag
../sp/olib_linear_algebra_sp.cpp:239: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:239: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:239: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../sp/olib_linear_algebra_sp.cpp:239: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:239: note: cost model: the vector iteration cost = 2 divided by the scalar iteration cost = 2 is greater or equal to the vectorization factor = 1.
../sp/olib_linear_algebra_sp.cpp:239: note: not vectorized: vectorization not profitable.
../sp/olib_linear_algebra_sp.cpp:232: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:271: note: not vectorized: data ref analysis failed D.25760_18->real = 1.0e+0;

../sp/olib_linear_algebra_sp.cpp:262: note: not vectorized: no vectype for stmt: D.25752_9->real = 0.0;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:255: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:311: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:304: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:326: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:319: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:341: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:334: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:380: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:373: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:395: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:388: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:410: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:403: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:460: note: not vectorized: multiple nested loops.
../sp/olib_linear_algebra_sp.cpp:462: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:469: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:442: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:517: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:508: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:534: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:525: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:551: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:542: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:597: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:581: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:622: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:606: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:646: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:630: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:687: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:680: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:704: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:697: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:719: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:712: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:770: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:772: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:760: note: Detected interleaving D.25501_9->real and D.25501_9->imag
../sp/olib_linear_algebra_sp.cpp:760: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:760: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:760: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../sp/olib_linear_algebra_sp.cpp:760: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:760: note: Cost model analysis:
  Vector inside of loop cost: 2
  Vector outside of loop cost: 9
  Scalar iteration cost: 2
  Scalar outside cost: 6
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 3

../sp/olib_linear_algebra_sp.cpp:760: note:   Profitability threshold = 2

../sp/olib_linear_algebra_sp.cpp:760: note: Profitability threshold is 2 loop iterations.
../sp/olib_linear_algebra_sp.cpp:760: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:760: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:753: note: vectorized 1 loops in function.

../sp/olib_linear_algebra_sp.cpp:802: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:804: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:792: note: Detected interleaving D.25479_9->real and D.25479_9->imag
../sp/olib_linear_algebra_sp.cpp:792: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:792: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:792: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../sp/olib_linear_algebra_sp.cpp:792: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:792: note: cost model: the vector iteration cost = 2 divided by the scalar iteration cost = 2 is greater or equal to the vectorization factor = 1.
../sp/olib_linear_algebra_sp.cpp:792: note: not vectorized: vectorization not profitable.
../sp/olib_linear_algebra_sp.cpp:785: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:833: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:835: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:823: note: not vectorized: no vectype for stmt: D.25457_9->real = 0.0;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:816: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:880: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:882: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:873: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:900: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:902: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:893: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:920: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:922: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:913: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance == 0 between D.25381_28->real and D.25381_28->real
../sp/olib_linear_algebra_sp.cpp:949: note: Detected interleaving D.25381_28->real and D.25381_28->imag
../sp/olib_linear_algebra_sp.cpp:949: note: Detected interleaving D.25381_28->real and D.25381_28->imag
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance == 0 between D.25381_28->imag and D.25381_28->imag
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:949: note: accesses have the same alignment.
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance modulo vf == 0 between D.25381_28->real and D.25381_28->real
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:949: note: accesses have the same alignment.
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance modulo vf == 0 between D.25381_28->imag and D.25381_28->imag
../sp/olib_linear_algebra_sp.cpp:949: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:949: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:949: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:949: note: Cost model analysis:
  Vector inside of loop cost: 7
  Vector outside of loop cost: 16
  Scalar iteration cost: 10
  Scalar outside cost: 6
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 2

../sp/olib_linear_algebra_sp.cpp:949: note:   Profitability threshold = 1

../sp/olib_linear_algebra_sp.cpp:949: note: Profitability threshold is 1 loop iterations.
../sp/olib_linear_algebra_sp.cpp:949: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:949: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:942: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:935: note: vectorized 1 loops in function.

../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance == 0 between D.25363_26->real and D.25363_26->real
../sp/olib_linear_algebra_sp.cpp:971: note: Detected interleaving D.25363_26->real and D.25363_26->imag
../sp/olib_linear_algebra_sp.cpp:971: note: Detected interleaving D.25363_26->real and D.25363_26->imag
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance == 0 between D.25363_26->imag and D.25363_26->imag
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:971: note: accesses have the same alignment.
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance modulo vf == 0 between D.25363_26->real and D.25363_26->real
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance  = 0.
../sp/olib_linear_algebra_sp.cpp:971: note: accesses have the same alignment.
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance modulo vf == 0 between D.25363_26->imag and D.25363_26->imag
../sp/olib_linear_algebra_sp.cpp:971: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:971: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:971: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:971: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:971: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:971: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:971: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:971: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:971: note: Cost model analysis:
  Vector inside of loop cost: 5
  Vector outside of loop cost: 6
  Scalar iteration cost: 6
  Scalar outside cost: 6
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 1

../sp/olib_linear_algebra_sp.cpp:971: note:   Profitability threshold = 0

../sp/olib_linear_algebra_sp.cpp:971: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:971: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:964: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:957: note: vectorized 1 loops in function.

../sp/olib_linear_algebra_sp.cpp:993: note: not vectorized: no vectype for stmt: D.25341_32 = D.25340_31->real;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:986: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:979: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:1008: note: versioning for alias required: can't determine dependence between D.25331_10->real and D.25330_7->real
../sp/olib_linear_algebra_sp.cpp:1008: note: mark for run-time aliasing test between D.25331_10->real and D.25330_7->real
../sp/olib_linear_algebra_sp.cpp:1008: note: Detected interleaving D.25331_10->real and D.25331_10->imag
../sp/olib_linear_algebra_sp.cpp:1008: note: versioning for alias required: can't determine dependence between D.25331_10->real and D.25330_7->imag
../sp/olib_linear_algebra_sp.cpp:1008: note: mark for run-time aliasing test between D.25331_10->real and D.25330_7->imag
../sp/olib_linear_algebra_sp.cpp:1008: note: Detected interleaving D.25330_7->real and D.25330_7->imag
../sp/olib_linear_algebra_sp.cpp:1008: note: versioning for alias required: can't determine dependence between D.25331_10->imag and D.25330_7->imag
../sp/olib_linear_algebra_sp.cpp:1008: note: mark for run-time aliasing test between D.25331_10->imag and D.25330_7->imag
../sp/olib_linear_algebra_sp.cpp:1008: note: found equal ranges D.25331_10->real, D.25330_7->imag and D.25331_10->real, D.25330_7->real
../sp/olib_linear_algebra_sp.cpp:1008: note: found equal ranges D.25331_10->imag, D.25330_7->imag and D.25331_10->real, D.25330_7->real
../sp/olib_linear_algebra_sp.cpp:1008: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1008: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1008: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1008: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1008: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1008: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1008: note: cost model: Adding cost of checks for loop versioning aliasing.

../sp/olib_linear_algebra_sp.cpp:1008: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:1008: note: Cost model analysis:
  Vector inside of loop cost: 4
  Vector outside of loop cost: 14
  Scalar iteration cost: 4
  Scalar outside cost: 1
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 6

../sp/olib_linear_algebra_sp.cpp:1008: note:   Profitability threshold = 5

../sp/olib_linear_algebra_sp.cpp:1008: note: Profitability threshold is 5 loop iterations.
../sp/olib_linear_algebra_sp.cpp:1008: note: create runtime check for data references D.25331_10->real and D.25330_7->real
../sp/olib_linear_algebra_sp.cpp:1008: note: created 1 versioning for alias checks.

../sp/olib_linear_algebra_sp.cpp:1008: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:1008: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:1003: note: vectorized 1 loops in function.

../sp/olib_linear_algebra_sp.cpp:1021: note: versioning for alias required: can't determine dependence between D.25321_10->real and D.25320_7->real
../sp/olib_linear_algebra_sp.cpp:1021: note: mark for run-time aliasing test between D.25321_10->real and D.25320_7->real
../sp/olib_linear_algebra_sp.cpp:1021: note: Detected interleaving D.25321_10->real and D.25321_10->imag
../sp/olib_linear_algebra_sp.cpp:1021: note: versioning for alias required: can't determine dependence between D.25321_10->real and D.25320_7->imag
../sp/olib_linear_algebra_sp.cpp:1021: note: mark for run-time aliasing test between D.25321_10->real and D.25320_7->imag
../sp/olib_linear_algebra_sp.cpp:1021: note: Detected interleaving D.25320_7->real and D.25320_7->imag
../sp/olib_linear_algebra_sp.cpp:1021: note: versioning for alias required: can't determine dependence between D.25321_10->imag and D.25320_7->imag
../sp/olib_linear_algebra_sp.cpp:1021: note: mark for run-time aliasing test between D.25321_10->imag and D.25320_7->imag
../sp/olib_linear_algebra_sp.cpp:1021: note: found equal ranges D.25321_10->real, D.25320_7->imag and D.25321_10->real, D.25320_7->real
../sp/olib_linear_algebra_sp.cpp:1021: note: found equal ranges D.25321_10->imag, D.25320_7->imag and D.25321_10->real, D.25320_7->real
../sp/olib_linear_algebra_sp.cpp:1021: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1021: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1021: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1021: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1021: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1021: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1021: note: cost model: Adding cost of checks for loop versioning aliasing.

../sp/olib_linear_algebra_sp.cpp:1021: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:1021: note: cost model: the vector iteration cost = 4 divided by the scalar iteration cost = 4 is greater or equal to the vectorization factor = 1.
../sp/olib_linear_algebra_sp.cpp:1021: note: not vectorized: vectorization not profitable.
../sp/olib_linear_algebra_sp.cpp:1016: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:1034: note: not vectorized: no vectype for stmt: D.25312_11 = D.25311_10->real;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:1029: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:1047: note: versioning for alias required: can't determine dependence between D.25299_10->real and D.25297_7->real
../sp/olib_linear_algebra_sp.cpp:1047: note: mark for run-time aliasing test between D.25299_10->real and D.25297_7->real
../sp/olib_linear_algebra_sp.cpp:1047: note: Detected interleaving D.25299_10->real and D.25299_10->imag
../sp/olib_linear_algebra_sp.cpp:1047: note: versioning for alias required: can't determine dependence between D.25299_10->real and D.25297_7->imag
../sp/olib_linear_algebra_sp.cpp:1047: note: mark for run-time aliasing test between D.25299_10->real and D.25297_7->imag
../sp/olib_linear_algebra_sp.cpp:1047: note: Detected interleaving D.25297_7->real and D.25297_7->imag
../sp/olib_linear_algebra_sp.cpp:1047: note: versioning for alias required: can't determine dependence between D.25299_10->imag and D.25297_7->imag
../sp/olib_linear_algebra_sp.cpp:1047: note: mark for run-time aliasing test between D.25299_10->imag and D.25297_7->imag
../sp/olib_linear_algebra_sp.cpp:1047: note: found equal ranges D.25299_10->real, D.25297_7->imag and D.25299_10->real, D.25297_7->real
../sp/olib_linear_algebra_sp.cpp:1047: note: found equal ranges D.25299_10->imag, D.25297_7->imag and D.25299_10->real, D.25297_7->real
../sp/olib_linear_algebra_sp.cpp:1047: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1047: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1047: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1047: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1047: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1047: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1047: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1047: note: cost model: Adding cost of checks for loop versioning aliasing.

../sp/olib_linear_algebra_sp.cpp:1047: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:1047: note: Cost model analysis:
  Vector inside of loop cost: 5
  Vector outside of loop cost: 16
  Scalar iteration cost: 6
  Scalar outside cost: 1
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 4

../sp/olib_linear_algebra_sp.cpp:1047: note:   Profitability threshold = 3

../sp/olib_linear_algebra_sp.cpp:1047: note: Profitability threshold is 3 loop iterations.
../sp/olib_linear_algebra_sp.cpp:1047: note: create runtime check for data references D.25299_10->real and D.25297_7->real
../sp/olib_linear_algebra_sp.cpp:1047: note: created 1 versioning for alias checks.

../sp/olib_linear_algebra_sp.cpp:1047: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:1047: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:1042: note: vectorized 1 loops in function.

../sp/olib_linear_algebra_sp.cpp:1060: note: not vectorized: no vectype for stmt: D.25287_11 = D.25286_10->real;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:1055: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:1074: note: versioning for alias required: can't determine dependence between D.25273_10->real and D.25271_7->real
../sp/olib_linear_algebra_sp.cpp:1074: note: mark for run-time aliasing test between D.25273_10->real and D.25271_7->real
../sp/olib_linear_algebra_sp.cpp:1074: note: Detected interleaving D.25273_10->real and D.25273_10->imag
../sp/olib_linear_algebra_sp.cpp:1074: note: versioning for alias required: can't determine dependence between D.25273_10->real and D.25271_7->imag
../sp/olib_linear_algebra_sp.cpp:1074: note: mark for run-time aliasing test between D.25273_10->real and D.25271_7->imag
../sp/olib_linear_algebra_sp.cpp:1074: note: Detected interleaving D.25271_7->real and D.25271_7->imag
../sp/olib_linear_algebra_sp.cpp:1074: note: versioning for alias required: can't determine dependence between D.25273_10->imag and D.25271_7->imag
../sp/olib_linear_algebra_sp.cpp:1074: note: mark for run-time aliasing test between D.25273_10->imag and D.25271_7->imag
../sp/olib_linear_algebra_sp.cpp:1074: note: found equal ranges D.25273_10->real, D.25271_7->imag and D.25273_10->real, D.25271_7->real
../sp/olib_linear_algebra_sp.cpp:1074: note: found equal ranges D.25273_10->imag, D.25271_7->imag and D.25273_10->real, D.25271_7->real
../sp/olib_linear_algebra_sp.cpp:1074: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1074: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1074: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1074: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1074: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1074: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1074: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1074: note: cost model: Adding cost of checks for loop versioning aliasing.

../sp/olib_linear_algebra_sp.cpp:1074: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:1074: note: Cost model analysis:
  Vector inside of loop cost: 5
  Vector outside of loop cost: 16
  Scalar iteration cost: 6
  Scalar outside cost: 1
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 4

../sp/olib_linear_algebra_sp.cpp:1074: note:   Profitability threshold = 3

../sp/olib_linear_algebra_sp.cpp:1074: note: Profitability threshold is 3 loop iterations.
../sp/olib_linear_algebra_sp.cpp:1074: note: create runtime check for data references D.25273_10->real and D.25271_7->real
../sp/olib_linear_algebra_sp.cpp:1074: note: created 1 versioning for alias checks.

../sp/olib_linear_algebra_sp.cpp:1074: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:1074: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:1069: note: vectorized 1 loops in function.

../sp/olib_linear_algebra_sp.cpp:1088: note: not vectorized: no vectype for stmt: D.25261_11 = D.25260_10->real;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:1083: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:1101: note: not vectorized: no vectype for stmt: D.25245_7->real = D.25249_12;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:1096: note: vectorized 0 loops in function.

../sp/olib_linear_algebra_sp.cpp:1114: note: not vectorized: no vectype for stmt: D.25232_7->real = D.25236_12;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:1109: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../sp/olib_quantum_sp.cpp

../sp/olib_quantum_sp.cpp:231: note: not vectorized: number of iterations cannot be computed.
../sp/olib_quantum_sp.cpp:227: note: vectorized 0 loops in function.

../sp/olib_quantum_sp.cpp:246: note: not vectorized: unexpected loop form.
../sp/olib_quantum_sp.cpp:243: note: vectorized 0 loops in function.

../sp/olib_quantum_sp.cpp:270: note: not vectorized: number of iterations cannot be computed.
../sp/olib_quantum_sp.cpp:257: note: vectorized 0 loops in function.

../sp/olib_quantum_sp.cpp:289: note: not vectorized: unexpected loop form.
../sp/olib_quantum_sp.cpp:289: note: not vectorized: Bad inner loop.
../sp/olib_quantum_sp.cpp:297: note: not vectorized: unexpected loop form.
../sp/olib_quantum_sp.cpp:282: note: vectorized 0 loops in function.

../sp/olib_quantum_sp.cpp:325: note: versioning for alias required: can't determine dependence between D.24987_10->real and D.24986_7->real
../sp/olib_quantum_sp.cpp:325: note: mark for run-time aliasing test between D.24987_10->real and D.24986_7->real
../sp/olib_quantum_sp.cpp:325: note: Detected interleaving D.24987_10->real and D.24987_10->imag
../sp/olib_quantum_sp.cpp:325: note: versioning for alias required: can't determine dependence between D.24987_10->real and D.24986_7->imag
../sp/olib_quantum_sp.cpp:325: note: mark for run-time aliasing test between D.24987_10->real and D.24986_7->imag
../sp/olib_quantum_sp.cpp:325: note: Detected interleaving D.24986_7->real and D.24986_7->imag
../sp/olib_quantum_sp.cpp:325: note: versioning for alias required: can't determine dependence between D.24987_10->imag and D.24986_7->imag
../sp/olib_quantum_sp.cpp:325: note: mark for run-time aliasing test between D.24987_10->imag and D.24986_7->imag
../sp/olib_quantum_sp.cpp:325: note: found equal ranges D.24987_10->real, D.24986_7->imag and D.24987_10->real, D.24986_7->real
../sp/olib_quantum_sp.cpp:325: note: found equal ranges D.24987_10->imag, D.24986_7->imag and D.24987_10->real, D.24986_7->real
../sp/olib_quantum_sp.cpp:325: note: Vectorizing an unaligned access.
../sp/olib_quantum_sp.cpp:325: note: Vectorizing an unaligned access.
../sp/olib_quantum_sp.cpp:325: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_quantum_sp.cpp:325: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_quantum_sp.cpp:325: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_quantum_sp.cpp:325: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_quantum_sp.cpp:325: note: cost model: Adding cost of checks for loop versioning aliasing.

../sp/olib_quantum_sp.cpp:325: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_quantum_sp.cpp:325: note: cost model: the vector iteration cost = 4 divided by the scalar iteration cost = 4 is greater or equal to the vectorization factor = 1.
../sp/olib_quantum_sp.cpp:325: note: not vectorized: vectorization not profitable.
../sp/olib_quantum_sp.cpp:321: note: vectorized 0 loops in function.

../sp/olib_quantum_sp.cpp:188: note: not vectorized: control flow in loop.
../sp/olib_quantum_sp.cpp:188: note: not vectorized: Bad inner loop.
../sp/olib_quantum_sp.cpp:192: note: not vectorized: control flow in loop.
../sp/olib_quantum_sp.cpp:146: note: Detected interleaving *init.6_23[i_96].real and *init.6_23[i_96].imag
../sp/olib_quantum_sp.cpp:146: note: Vectorizing an unaligned access.
../sp/olib_quantum_sp.cpp:146: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_quantum_sp.cpp:146: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../sp/olib_quantum_sp.cpp:146: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_quantum_sp.cpp:146: note: cost model: the vector iteration cost = 2 divided by the scalar iteration cost = 2 is greater or equal to the vectorization factor = 1.
../sp/olib_quantum_sp.cpp:146: note: not vectorized: vectorization not profitable.
../sp/olib_quantum_sp.cpp:133: note: vectorized 0 loops in function.

../sp/olib_quantum_sp.cpp:337: note: versioning for alias required: can't determine dependence between D.24977_10->real and D.24976_7->real
../sp/olib_quantum_sp.cpp:337: note: mark for run-time aliasing test between D.24977_10->real and D.24976_7->real
../sp/olib_quantum_sp.cpp:337: note: Detected interleaving D.24977_10->real and D.24977_10->imag
../sp/olib_quantum_sp.cpp:337: note: versioning for alias required: can't determine dependence between D.24977_10->real and D.24976_7->imag
../sp/olib_quantum_sp.cpp:337: note: mark for run-time aliasing test between D.24977_10->real and D.24976_7->imag
../sp/olib_quantum_sp.cpp:337: note: Detected interleaving D.24976_7->real and D.24976_7->imag
../sp/olib_quantum_sp.cpp:337: note: versioning for alias required: can't determine dependence between D.24977_10->imag and D.24976_7->imag
../sp/olib_quantum_sp.cpp:337: note: mark for run-time aliasing test between D.24977_10->imag and D.24976_7->imag
../sp/olib_quantum_sp.cpp:337: note: found equal ranges D.24977_10->real, D.24976_7->imag and D.24977_10->real, D.24976_7->real
../sp/olib_quantum_sp.cpp:337: note: found equal ranges D.24977_10->imag, D.24976_7->imag and D.24977_10->real, D.24976_7->real
../sp/olib_quantum_sp.cpp:337: note: Vectorizing an unaligned access.
../sp/olib_quantum_sp.cpp:337: note: Vectorizing an unaligned access.
../sp/olib_quantum_sp.cpp:337: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_quantum_sp.cpp:337: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_quantum_sp.cpp:337: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_quantum_sp.cpp:337: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_quantum_sp.cpp:337: note: cost model: Adding cost of checks for loop versioning aliasing.

../sp/olib_quantum_sp.cpp:337: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_quantum_sp.cpp:337: note: Cost model analysis:
  Vector inside of loop cost: 4
  Vector outside of loop cost: 14
  Scalar iteration cost: 4
  Scalar outside cost: 1
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 6

../sp/olib_quantum_sp.cpp:337: note:   Profitability threshold = 5

../sp/olib_quantum_sp.cpp:337: note: Profitability threshold is 5 loop iterations.
../sp/olib_quantum_sp.cpp:337: note: create runtime check for data references D.24977_10->real and D.24976_7->real
../sp/olib_quantum_sp.cpp:337: note: created 1 versioning for alias checks.

../sp/olib_quantum_sp.cpp:337: note: vectorizing stmts using SLP.
../sp/olib_quantum_sp.cpp:337: note: LOOP VECTORIZED.
../sp/olib_quantum_sp.cpp:333: note: vectorized 1 loops in function.

../sp/olib_quantum_sp.cpp:95: note: not vectorized: control flow in loop.
../sp/olib_quantum_sp.cpp:95: note: not vectorized: Bad inner loop.
../sp/olib_quantum_sp.cpp:99: note: not vectorized: control flow in loop.
../sp/olib_quantum_sp.cpp:52: note: Detected interleaving *init.14_23[i_106].real and *init.14_23[i_106].imag
../sp/olib_quantum_sp.cpp:52: note: Vectorizing an unaligned access.
../sp/olib_quantum_sp.cpp:52: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_quantum_sp.cpp:52: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../sp/olib_quantum_sp.cpp:52: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_quantum_sp.cpp:52: note: Cost model analysis:
  Vector inside of loop cost: 2
  Vector outside of loop cost: 9
  Scalar iteration cost: 2
  Scalar outside cost: 6
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 3

../sp/olib_quantum_sp.cpp:52: note:   Profitability threshold = 2

../sp/olib_quantum_sp.cpp:52: note: Profitability threshold is 2 loop iterations.
../sp/olib_quantum_sp.cpp:52: note: vectorizing stmts using SLP.
../sp/olib_quantum_sp.cpp:52: note: LOOP VECTORIZED.
../sp/olib_quantum_sp.cpp:39: note: vectorized 1 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../opencl/olib_quantum_ocl.cpp
../opencl/olib_quantum_ocl.cpp: In function 'int dqft_ocl(complex_float*, long long unsigned int, int, int)':
../opencl/olib_quantum_ocl.cpp:62:97: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_quantum_ocl.cpp: In function 'int dqft_ocl(complex_double*, long long unsigned int, int, int)':
../opencl/olib_quantum_ocl.cpp:254:98: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]

../opencl/olib_quantum_ocl.cpp:77: note: not vectorized: control flow in loop.
../opencl/olib_quantum_ocl.cpp:55: note: Detected interleaving *init.21_23[i_60].real and *init.21_23[i_60].imag
../opencl/olib_quantum_ocl.cpp:55: note: Vectorizing an unaligned access.
../opencl/olib_quantum_ocl.cpp:55: note: vect_model_store_cost: unaligned supported by hardware.
../opencl/olib_quantum_ocl.cpp:55: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../opencl/olib_quantum_ocl.cpp:55: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../opencl/olib_quantum_ocl.cpp:55: note: Cost model analysis:
  Vector inside of loop cost: 2
  Vector outside of loop cost: 9
  Scalar iteration cost: 2
  Scalar outside cost: 6
  prologue iterations: 0
  epilogue iterations: 1
  Calculated minimum iters for profitability: 3

../opencl/olib_quantum_ocl.cpp:55: note:   Profitability threshold = 2

../opencl/olib_quantum_ocl.cpp:55: note: Profitability threshold is 2 loop iterations.
../opencl/olib_quantum_ocl.cpp:55: note: vectorizing stmts using SLP.
../opencl/olib_quantum_ocl.cpp:55: note: LOOP VECTORIZED.
../opencl/olib_quantum_ocl.cpp:41: note: vectorized 1 loops in function.

../opencl/olib_quantum_ocl.cpp:269: note: not vectorized: control flow in loop.
../opencl/olib_quantum_ocl.cpp:244: note: Detected interleaving *init.2_23[i_48].real and *init.2_23[i_48].imag
../opencl/olib_quantum_ocl.cpp:244: note: Vectorizing an unaligned access.
../opencl/olib_quantum_ocl.cpp:244: note: vect_model_store_cost: unaligned supported by hardware.
../opencl/olib_quantum_ocl.cpp:244: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../opencl/olib_quantum_ocl.cpp:244: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../opencl/olib_quantum_ocl.cpp:244: note: cost model: the vector iteration cost = 2 divided by the scalar iteration cost = 2 is greater or equal to the vectorization factor = 1.
../opencl/olib_quantum_ocl.cpp:244: note: not vectorized: vectorization not profitable.
../opencl/olib_quantum_ocl.cpp:229: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../sp/olib_discrete_sp.cpp

../sp/olib_discrete_sp.cpp:64: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:57: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:106: note: not vectorized: number of iterations cannot be computed.
../sp/olib_discrete_sp.cpp:101: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:106: note: not vectorized: number of iterations cannot be computed.
../sp/olib_discrete_sp.cpp:139: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:181: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:168: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:220: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:217: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:293: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:252: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:319: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:310: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:305: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:359: note: not vectorized: number of iterations cannot be computed.
../sp/olib_discrete_sp.cpp:352: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:385: note: not vectorized: number of iterations cannot be computed.
../sp/olib_discrete_sp.cpp:378: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:411: note: not vectorized: number of iterations cannot be computed.
../sp/olib_discrete_sp.cpp:404: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:436: note: not vectorized: number of iterations cannot be computed.
../sp/olib_discrete_sp.cpp:429: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:483: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:474: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:508: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:499: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:533: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:524: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:558: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:549: note: vectorized 0 loops in function.

../sp/olib_discrete_sp.cpp:587: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:578: note: vectorized 0 loops in function.
g++ -Wall shor_algorithm_ocl_core_float.o olib_primitives.o olib_linear_algebra_sp.o olib_quantum_sp.o olib_quantum_ocl.o olib_discrete_sp.o olib_primitives_ocl.o olib_strings_sp.o -o shor_algorithm_ocl_float -lOpenCL

fox@APU:~/test/trunk/samples$ make shor_algorithm_ocl_double
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona shor_algorithm.cpp -o shor_algorithm_ocl_core_double.o -D USE_OPENCL -D DOUBLE_PRECISION
shor_algorithm.cpp: In function 'int main(int, char**)':
shor_algorithm.cpp:518:53: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
shor_algorithm.cpp:633:164: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:1107:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
shor_algorithm.cpp:1222:173: warning: format '%lld' expects argument of type 'long long int', but argument 6 has type 'double' [-Wformat]
shor_algorithm.cpp:1222:173: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:1226:162: warning: too many arguments for format [-Wformat-extra-args]

shor_algorithm.cpp:714: note: Detected interleaving *collapse.9_140[i_2].imag and *collapse.9_140[i_2].real
shor_algorithm.cpp:714: note: Vectorizing an unaligned access.
shor_algorithm.cpp:714: note: Vectorizing an unaligned access.
shor_algorithm.cpp:714: note: vect_model_store_cost: unaligned supported by hardware.
shor_algorithm.cpp:714: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
shor_algorithm.cpp:714: note: vect_model_load_cost: unaligned supported by hardware.
shor_algorithm.cpp:714: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
shor_algorithm.cpp:714: note: not vectorized: relevant stmt not supported: cstore.33_397 = [cond_expr] value_242 == D.25144_245 ? 1.0e+0 : 0.0;

shor_algorithm.cpp:679: note: not vectorized: control flow in loop.
shor_algorithm.cpp:518: note: not vectorized: control flow in loop.
shor_algorithm.cpp:505: note: not vectorized: control flow in loop.
shor_algorithm.cpp:452: note: not vectorized: control flow in loop.
shor_algorithm.cpp:384: note: not vectorized: control flow in loop.
shor_algorithm.cpp:214: note: vectorized 0 loops in function.
shor_algorithm.cpp:1107:3: warning: 'factor' may be used uninitialized in this function [-Wuninitialized]
g++ -Wall shor_algorithm_ocl_core_double.o olib_primitives.o olib_linear_algebra_sp.o olib_quantum_sp.o olib_quantum_ocl.o olib_discrete_sp.o olib_primitives_ocl.o olib_strings_sp.o -o shor_algorithm_ocl_double -lOpenCL

Martin Fox

fox@APU:~/test/trunk/samples$ ./benchmark_sp
====================================================================
OLib Benchmark.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: Sequential
Compile date: Aug  9 2012
Compile time: 22:35:58
Architecture: x86_64
====================================================================


[1: FPU-32] Initial test
Calculations... Please wait...
Execution time: 47 seconds
Rate: 2.127660

[1: FPU-32] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 99 seconds
Special Units Testing...
Execution time: 49 seconds
Performance: [32-bit float types]
0.383818 GFLOPS
2081632 special operations (e.g. sin, cos) per second

[1: FPU-64] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 115 seconds
Special Units Testing...
Execution time: 42 seconds
Performance: [64-bit float types]
0.031584 GFLOPS
232142 special operations (e.g. sin, cos) per second

[2: ALU-64] Final test
Calculations... Please wait...
Execution time: 30 seconds
Performance: [64-bit integer types]
9873.600013 MIPS

fox@APU:~/test/trunk/samples$ ./benchmark_ocl
====================================================================
OLib Benchmark.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: OpenCL
Compile date: Aug  9 2012
Compile time: 22:36:28
Architecture: x86_64
====================================================================

--- 1 OpenCL platform(s) found: -----------------------------------------
  -- 0 --
  PROFILE = FULL_PROFILE
  VERSION = OpenCL 1.2 AMD-APP (923.1)
  NAME = AMD Accelerated Parallel Processing
  VENDOR = Advanced Micro Devices, Inc.
  EXTENSIONS = cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
--- 1 OpenCL device(s) found on platform: -----------------------------------------

  ------------------- 0 -------------------
  DEVICE_NAME = BeaverCreek
  DEVICE_VENDOR = Advanced Micro Devices, Inc.
  DEVICE_VERSION = OpenCL 1.2 AMD-APP (923.1)
  DRIVER_VERSION = CAL 1.4.1720
  CL_DEVICE_EXTENSIONS = cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt
  DEVICE_MAX_COMPUTE_UNITS = 5
  DEVICE_MAX_CLOCK_FREQUENCY = 900
  DEVICE_GLOBAL_MEM_SIZE = 268435456
  CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 0
  CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 0
  CL_DEVICE_LOCAL_MEM_SIZE = 32768
  CL_DEVICE_MAX_MEM_ALLOC_SIZE = 134217728
  CL_DEVICE_MAX_MEM_ALLOC_SIZE = 134217728

Enter the device number:
>>

RAD-Poland

wyniki 1000 iteracji z 5250 czas ~2h10
./shor_algorithm_tester.sh shor_algorithm_ocl_float Radeon7770

w załączniku temporary.zip
general_log.shor_algorithm_ocl_float.Radeon7770
result.shor_algorithm_ocl_float.Radeon7770.45
result.shor_algorithm_ocl_float.Radeon7770.111
result.shor_algorithm_ocl_float.Radeon7770.213
result.shor_algorithm_ocl_float.Radeon7770.315


HD770 trochę za słaba do wykonania pełnego zadania  :(

   
WCG:
PG:         YOYO:

     

RAD-Poland

przy kolejnym teście ujawnił się  :cold: problem (a noc była zimna  :P)

dzięki tworzonym plikom tymczasowym można wywnioskować,ze problemem nie był X-serwer
problem prawdopodobnie w sterownikach lub skrypcie nie wykluczam też hardware (leciwego)
skrypt wykonał 240 z 250 iteracji kolejnej liczby 411 w ~26min czyli dość szybko po czym zatrzymał się  :(
podczas testu nie zaobserwowałem przegrzewania się GPU ~55C a wentylatory przyspieszyły z 20-30% (XFXHD7770 Double Dissipation) ma czym chłodzić, do tego chłodne powietrze z zewnątrz
procesy uruchomione w tle pracowały bez problemu jeszcze 8h łącznie z BOINC Rosetta@home uruchomioną na 1 rdzeniu

może to problem tylko u mnie poczekajmy na wyniki pozostałych testerów
choć do tej pory generalnie nie było problemu z GPU  przy projektach które testowałem, tylko przy liczeniu PrimeGrid przycinał

   
WCG:
PG:         YOYO:

     

RAD-Poland

~ v46

bash-4.2$ ./benchmark_ocl
====================================================================
OLib Benchmark.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: OpenCL
Compile date: Aug 10 2012
Compile time: 16:39:11
Architecture: x86_64
====================================================================

--- 1 OpenCL platform(s) found: -----------------------------------------
  -- 0 --
  PROFILE = FULL_PROFILE
  VERSION = OpenCL 1.2 AMD-APP (938.1)
  NAME = AMD Accelerated Parallel Processing
  VENDOR = Advanced Micro Devices, Inc.
  EXTENSIONS = cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
--- 1 OpenCL device(s) found on platform: -----------------------------------------

  ------------------- 0 -------------------
  DEVICE_NAME = Capeverde
  DEVICE_VENDOR = Advanced Micro Devices, Inc.
  DEVICE_VERSION = OpenCL 1.2 AMD-APP (938.1)
  DRIVER_VERSION = CAL 1.4.1741 (VM)
  CL_DEVICE_EXTENSIONS = cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt
  DEVICE_MAX_COMPUTE_UNITS = 8
  DEVICE_MAX_CLOCK_FREQUENCY = 1000
  DEVICE_GLOBAL_MEM_SIZE = 874512384
  CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 16384
  CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 64
  CL_DEVICE_LOCAL_MEM_SIZE = 32768
  CL_DEVICE_MAX_MEM_ALLOC_SIZE = 524288000
  CL_DEVICE_MAX_MEM_ALLOC_SIZE = 524288000

Enter the device number:
>> 0

[1: FPU-32] Initial test
        Calculations... Please wait...
        Execution time: 23 seconds
        Rate: 4.347826

[1: FPU-32] Final test
        Calculations... Please wait...
        FP Units Testing...
        Execution time: 96 seconds
        Special Units Testing...
        Execution time: 141 seconds
        Performance: [32-bit float types]
                414.639362 GFLOPS
                757813829 special operations (e.g. sin, cos) per second

[1: FPU-64] Final test
        Calculations... Please wait...
        FP Units Testing...
        Execution time: 405 seconds
        Special Units Testing...
        Execution time: 109 seconds
        Performance: [64-bit float types]
                4.913934 GFLOPS
                49011467 special operations (e.g. sin, cos) per second

[2: ALU-64] Final test
        Calculations... Please wait...
        Execution time: 143 seconds
        Performance: [64-bit integer types]
                144660.833758 MIPS

   
WCG:
PG:         YOYO: