Posiada może ktoś jakiegoś Radeona z systemem operacyjnym Linux i zainstalowanym środowiskiem OpenCL?
Pracuję nad pewną app i chciałbym przetestować ją na kartach AMD.
Pusto? Wszyscy się ukryli? :deadman:
na mojm flagowcu z radkiem pracuje win
Nikt nie ma AMD z Linuxem...
A kto ma nVidię z Linuxem?
Cytat: Rysiu w 06 Sierpień 2012, 21:33
Nikt nie ma AMD z Linuxem...
A kto ma nVidię z Linuxem?
Ja mam nv z windą, ale na jednej wirtualce mam linuxa - tyle, że na wirtualce na cudach nie podziałamy.
Ja mam amd, ale apu - nie wiem czy to cie zadowala :P
Może być APU. Z tego co widzę normalnie to wspiera OpenCL.
patrz PW -> konfiguracja słaba ale może do testów się nada ;)
Athlon64 X2 + Radeon HD 7770
Konfiguracja bardzo dobra. Wystarczająca w zupełności.
Testy wykonuje do swojej pracy dyplomowej. Program symuluje algorytmy kwantowe. Potrzebuje pomocy przy testowaniu szybkości wykonywania algorytmu Shora.
Jest napisany także prosty benchmark. Testuje on zarówno CPU jak i inne sprzęty (np. GPU).
Link do repozytorium jest tutaj:
http://code.google.com/p/olib/ (http://code.google.com/p/olib/)
Możemy wpisać do konsoli:
svn checkout https://olib.googlecode.com/svn/trunk/
W katalogu gdzie będziemy zostanie utworzony folder 'trunk' i zostanie do niego zassane repozyturium.
Programy znajdują się w katalogu 'samples'. TAm też jest plik makefile. W pliku makefile jest dość dużo wersji programów dlatego nie polecam wpisywać 'make' lub 'make all' ponieważ zapewne czegoś w naszym systemie zabraknie. Lista dostępnych progrosów jest w samples/programs.list
Benchmark w wersji na jeden rdzeń można skompilować następująco:
make benchmark_sp
Benchmark w wersji dla OpenCL (karty AMD itp.) można skompilować za pomocą:
make benchmark_ocl
Po uruchomieniu wersji OpenCL powinna pokazać się lista dostępnych urządzeń, które można przetestować.
Sam algorytm Shora na OpenCL (wersja na float) kompilujemy za pomocą:
shor_algorithm_ocl_float
Możemy też skompilować wersję na double:
shor_algorithm_ocl_double
Czasy wykonywania algorytmu Shora sa różne. Na GPU mogą być bardzo niskie.
Najlepiej całość pobrać z repozytorium (przez polecenie) ponieważ tam znajduje się najnowsza rewizja (aktualnie 31). Wieczorem wyślę spakowane archiwum.
Na poczatek prosze o sprawdzenie czy uda się skompilować (wersję na OpenCL i ewentualnie sp).
Mogą pojawić się problemy z przekroczeniem rozmiaru stosu. Dla pewności proponuję wpisać przed uruchomieniem:
ulimit -s unlimited
Zmienia to rozmiar stosu na unlimited. Nie wpływa negatywnie na system. Zmiana zachodzi tylko dla jednej sesji i gdy odpalimy kolejny terminal to znowu trzeba podać ulimit -s unlimited. Nie wiem czy będzie to niezbędne ale pisze dla pewności.
Praca testera powinna ograniczyć się do sprawdzenia czy progros się skompiluje. Uruchomienia skryptu i odesłania wyników. Niczego dodatkowego nie będzie tester musiał robić więc nakład pracy raczej postaram się zredukować.
- 12.4 wg CCC
- AMD-APP-SDK-v2.7
wymagana była instalacja:
- systemu zarządzania wersjami "svn" -> pakiet subversion
- kompilatora g++ -> pakiet gcc-c++
benchmark_sp
kompilacja bez problemu
bash-4.2$ ./benchmark_sp
====================================================================
OLib Benchmark.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: Sequential
Compile date: Aug 8 2012
Compile time: 14:36:57
Architecture: x86_64
====================================================================
[1: FPU-32] Initial test
Calculations... Please wait...
Execution time: 68 seconds
Rate: 1.470588
[1: FPU-32] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 100 seconds
Special Units Testing...
Execution time: 57 seconds
Performance: [32-bit float types]
0.262633 GFLOPS
1236842 special operations (e.g. sin, cos) per second
[1: FPU-64] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 116 seconds
Special Units Testing...
Execution time: 51 seconds
Performance: [64-bit float types]
0.021677 GFLOPS
132352 special operations (e.g. sin, cos) per second
[2: ALU-64] Final test
Calculations... Please wait...
Execution time: 30 seconds
Performance: [64-bit integer types]
6824.400009 MIPS
benchmark_ocl
problemy przy kompilacji, nie znajduje CL/cl.h
pewnie powinno być globalne przypicsanie CL/ do /opt/AMDAPP/include
ale po drobnej zmianie w "makefile" (najczęsciej stosowana metoda) dodanie przy kompilacji -I/opt/AMDAPP/include kompilacja przeszła, choć jest trochę warning'ów
bash-4.2$ make benchmark_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include benchmark.cpp -o benchmark_core_ocl.o -D USE_OPENCL
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../primitives/olib_primitives_ocl.cpp
../primitives/olib_primitives_ocl.cpp: In function 'int support_fp64(int)':
../primitives/olib_primitives_ocl.cpp:79:84: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../primitives/olib_primitives_ocl.cpp: In function 'int search_devices(_cl_device_id**, cl_uint*)':
../primitives/olib_primitives_ocl.cpp:102:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
../primitives/olib_primitives_ocl.cpp:131:19: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
../primitives/olib_primitives_ocl.cpp: In function 'char* olib_read_opencl_kernel(char*)':
../primitives/olib_primitives_ocl.cpp:235:21: warning: variable 'readlen' set but not used [-Wunused-but-set-variable]
../primitives/olib_primitives_ocl.cpp:136: note: not vectorized: control flow in loop.
../primitives/olib_primitives_ocl.cpp:105: note: not vectorized: control flow in loop.
../primitives/olib_primitives_ocl.cpp:94: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../sp/olib_strings_sp.cpp
../sp/olib_strings_sp.cpp:71: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:71: note: not vectorized: Bad inner loop.
../sp/olib_strings_sp.cpp:74: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:48: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:48: note: not vectorized: Bad inner loop.
../sp/olib_strings_sp.cpp:52: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:35: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../opencl/olib_benchmark_ocl.cpp -lOpenCL
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_fp_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:41:110: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_fp_units_ocl(double*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:173:111: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_special_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:306:115: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_special_units_ocl(double*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:439:116: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_mips_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:575:106: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
g++ -Wall benchmark_core_ocl.o olib_primitives_ocl.o olib_strings_sp.o olib_benchmark_ocl.o -o benchmark_ocl -lOpenCL
z testem trochę gorzej (opisowo)
test FPU32 bit:
- po kilku sekundach zamrożenie systemu, brak reakcji na klawiaturę, mysz, brak odświeżania ekranu
- po kilkudziesięciu sekundach zakończenie testu przebudzenie, wyświetlenie wyników
- na kilka sekund wszystko OK, do uruchomienia kolejnego testu
test FPU64 bit:
- natychmiastowe zamrożenie systemu i .... :(
- po poborze mocy wniosek, że test przebiega
- po ok 7 minutach spada pobór mocy do poziomu pracy GPU z wysokimi zegarami, ale nieobciążonej karty
- system nie reaguje, dysk pracuje, procesy w tle działają, X serwer nie wstaje
- restart komputera
może powód moich zmian bez zagłębiania się kod lub braku cierpliwości na dokończenie testu, która skończyła się po ~20min ;)
test przeprowadzony z wersją 31
obecnie widzę nową wersję 32, ale do czasu Twojej opinii/ sugestii wstrzymuję się od kolejnego testu
EDIT: nie wykonałem kolejnych czynności (kompilacji algorytmu shor'a), ani zmiany stosu może w tym tkwił problem
wykonywałem czynności w sugerowanej wyżej kolejności :P
Wersja 32 niczego nie wnosi do problemów, o których wspomniałeś.
Ciekawi mnie jak benchmark wykrył urządzenie OpenCL? Przed uruchomieniem testu miałeś tam wyświetlone informacje o device. Możesz je podesłać bez uruchamiania samego testu (tylko uruchom benchmark, skopiuj informacje i nie zatwierdzaj aby liczył dalej).
owszem jest przy uruchomieniu ./benchmark_ocl
ale nie mogłem tego zacytować, gdyż już nie miałem możliwości przechwycić konsoli
bash-4.2$ ./benchmark_ocl
====================================================================
OLib Benchmark.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: OpenCL
Compile date: Aug 8 2012
Compile time: 17:54:02
Architecture: x86_64
====================================================================
--- 1 OpenCL platform(s) found: -----------------------------------------
-- 0 --
PROFILE = FULL_PROFILE
VERSION = OpenCL 1.2 AMD-APP (938.1)
NAME = AMD Accelerated Parallel Processing
VENDOR = Advanced Micro Devices, Inc.
EXTENSIONS = cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
--- 1 OpenCL device(s) found on platform: -----------------------------------------
------------------- 0 -------------------
DEVICE_NAME = Capeverde
DEVICE_VENDOR = Advanced Micro Devices, Inc.
DEVICE_VERSION = OpenCL 1.2 AMD-APP (938.1)
DRIVER_VERSION = CAL 1.4.1741 (VM)
CL_DEVICE_EXTENSIONS = cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt
DEVICE_MAX_COMPUTE_UNITS = 8
DEVICE_MAX_CLOCK_FREQUENCY = 1000
DEVICE_GLOBAL_MEM_SIZE = 843055104
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 16384
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 64
CL_DEVICE_LOCAL_MEM_SIZE = 32768
CL_DEVICE_MAX_MEM_ALLOC_SIZE = 524288000
CL_DEVICE_MAX_MEM_ALLOC_SIZE = 524288000
Enter the device number:
>>
EDIT: wybierałem "0"
Problemem może być interfejs graficzny.
Dasz radę wyłączyć interfejs graficzny i odpalić program mając w systemie tylko tryb tekstowy?
Ja do wieczora postaram się poprawić benchamark i napiszę na forum jak będzie nowa rewizja.
PS. Faktycznie przetestowałem u siebie na nVidia Tesla C2050 i dostałem:
[1: FPU-32] Initial test
Calculations... Please wait...
Execution time: 19 seconds
Rate: 5.263158
[1: FPU-32] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 84 seconds
Special Units Testing...
Execution time: 127 seconds
Performance: [32-bit float types]
573.638161 GFLOPS
1018482283 special operations (e.g. sin, cos) per second
[1: FPU-64] Final test
Calculations... Please wait...
FP Units Testing...
ERROR: CL_INVALID_BINARY Execution time: 438 seconds
Special Units Testing...
ERROR: CL_INVALID_BINARY Execution time: 25 seconds
Performance: [64-bit float types]
11.001088 GFLOPS
517380000 special operations (e.g. sin, cos) per second
[2: ALU-64] Final test
Calculations... Please wait...
Execution time: 96 seconds
Performance: [64-bit integer types]
260850.292889 MIPS
Na GeForce GTS250 gdzie nie ma double precision działa dobrze:
[1: FPU-32] Initial test
Calculations... Please wait...
Execution time: 105 seconds
Rate: 0.952381
[1: FPU-32] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 106 seconds
Special Units Testing...
Execution time: 51 seconds
Performance: [32-bit float types]
82.255991 GFLOPS
458926470 special operations (e.g. sin, cos) per second
[1: FPU-64] Final test
FP-64 NOT SUPPORTED
[2: ALU-64] Final test
Calculations... Please wait...
Execution time: 30 seconds
Performance: [64-bit integer types]
151041.883120 MIPS
Jest jakiś ostry wałek w double. Zaraz będę szukał. Może pamiętasz jaką wydajność w FPU-32 pokazał Ci test?
Możesz też spróbowac do tego czasu skompilować i odpalić shor_algorithm_ocl_float (działa na 32-bit float więc jest szansa, że będzie ok)
On powinien wykonywać się szybko. Zwraca na ekran kilka liczb. Jeżeli się uruchomi i zadziała to wykonaj jego kilka razy (ma różne czasy wykonywania i zwraca różne wyniki).
wyłączenie interfejsu graficznego powoduje brak dostępu do urządzenia :fright:
Cytat--- 1 OpenCL platform(s) found: -----------------------------------------
-- 0 --
PROFILE = FULL_PROFILE
VERSION = OpenCL 1.2 AMD-APP (938.1)
NAME = AMD Accelerated Parallel Processing
VENDOR = Advanced Micro Devices, Inc.
EXTENSIONS = cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
OpenCL Error 'clGetDeviceIDs (platforms[0],CL_DEVICE_TYPE_GPU,100,devices,devices_n)' returned-1!
Przerwane
GPU znajduje tylko po podaniu polecenia "startx" czyli pod interfejsem graficznym na 8 terminalu na pozostałych 1-6 w/w stan
natomiast przy wyłączonych X brak dostępu do 7 i 8 terminalu
(? przypadłość sterowników lub systemu Mandriv'y ?)
CytatMoże pamiętasz jaką wydajność w FPU-32 pokazał Ci test?
Cytat[1: FPU-32] Initial test
Calculations... Please wait...
Execution time: 21 seconds
Rate: 4,761905
[1: FPU-32] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 103 seconds
Special Units Testing...
/tmp/OCL3yOly3.cl; line 42:error: more than one instance of overloaded
function "pow" matches the argument list:
function "pow(float, float) C++"
function "pow(double, double) C++"
argument types are: (float, double)
tmp -=pow(tmp, -.5); //4ops
Internal error: clc compiler invocation failed.
ERROR ......CL_INVALID_KERNEL ...... (nieczytelne zdjęcie)
Performance: [32-bit float types]
423.267122 GFLOPS
117028500000 special operations (e.g. sin, cos) per second
dane pogrubione naniesione ręcznie możliwe literówki
zwróć uwagę na błędy przy benchmarku 32bit, których nie masz na NVIDIi
EDIT:
przy teście nVidia Tesla C2050 masz
CytatERROR: CL_INVALID_BINARY Execution time: 438 seconds
Radeon HD 7770 ma dość słabe DP może za krótko czekałem,
czas testu po poborze mocy kreśliłem podobnie ~7 min (420sek)
ale w sumie jeszcze kilkanaście minut gdy był zmniejszony pobór mocy przez GPU
O kurcze faktycznie tak jest problem z przeciążoną funkcją. Zaraz to poprawię.
A próbowałeś coś z tym Shor'em na float?
Cytat: RAD-Poland w 08 Sierpień 2012, 20:51
przy teście nVidia Tesla C2050 masz
CytatERROR: CL_INVALID_BINARY Execution time: 438 seconds
Szukałem rozwiązania dlaczego tak u mnie jest. Okazało się, że najprawdopodobniej jest to spowodowane starą wersją OpenCL.
Ja mam jedynie:
--- 1 OpenCL platform(s) found: -----------------------------------------
-- 0 --
PROFILE = FULL_PROFILE
VERSION = OpenCL 1.0 CUDA 3.2.1
NAME = NVIDIA CUDA
VENDOR = NVIDIA Corporation
EXTENSIONS = cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
W ciągu 30 minut dodam do repozytorium nową rewizję z dodatkową pragmą dla AMD to zobaczymy czy to coś zmieni.
- w/w testy dla v32
- stery
12.4 wg CCC instalowane z amd-driver-installer-
12-6-x86.x86_64.run
- tyle restartów tzn 5 nie miałem przez ostatnie 2 lata (była okazja na odkurzenie budy) ;)
CytatA próbowałeś coś z tym Shor'em na float?
jeszcze nie,
shor_algorithm_ocl_floatkompiluje się, -> długa nie wklejam tych kilometrów
po kompilacji
bash-4.2$ ./shor_algorithm_ocl_float
Naruszenie ochrony pamięci
shor_algorithm_ocl_doublebash-4.2$ make shor_algorithm_ocl_double
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include shor_algorithm.cpp -o shor_algorithm_ocl_core_double.o -D USE_OPENCL -D DOUBLE_PRECISION
shor_algorithm.cpp: In function 'int main(int, char**)':
shor_algorithm.cpp:502:53: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
shor_algorithm.cpp:617:164: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:1091:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
shor_algorithm.cpp:1206:173: warning: format '%lld' expects argument of type 'long long int', but argument 6 has type 'double' [-Wformat]
shor_algorithm.cpp:1206:173: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:1210:162: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:698: note: not vectorized: control flow in loop.
shor_algorithm.cpp:663: note: not vectorized: control flow in loop.
shor_algorithm.cpp:502: note: not vectorized: control flow in loop.
shor_algorithm.cpp:489: note: not vectorized: control flow in loop.
shor_algorithm.cpp:436: note: not vectorized: control flow in loop.
shor_algorithm.cpp:368: note: not vectorized: control flow in loop.
shor_algorithm.cpp:214: note: vectorized 0 loops in function.
shor_algorithm.cpp:1140:3: warning: 'factor' may be used uninitialized in this function [-Wuninitialized]
g++ -Wall shor_algorithm_ocl_core_double.o olib_primitives.o olib_linear_algebra_sp.o olib_quantum_sp.o olib_quantum_ocl.o olib_discrete_sp.o olib_primitives_ocl.o olib_strings_sp.o -o shor_algorithm_ocl_double -lOpenCL
bash-4.2$ ./shor_algorithm_ocl_double
Naruszenie ochrony pamięci
:(
Cytat: RAD-Poland w 08 Sierpień 2012, 21:11
- jak system wytrzyma to zaraz wkleję double....................
:deadman:
bash-4.2$ ./shor_algorithm_ocl_float
Naruszenie ochrony pamięci
Odpowiedz na to:
ulimit -s unlimited
bash-4.2$ ulimit -s unlimited
bash-4.2$ ./shor_algorithm_ocl_float
0 1 6 1.98 1.19 0
:)
bash-4.2$ ulimit -s unlimited
bash-4.2$ ./shor_algorithm_ocl_double
"/tmp/OCLLqoikk.cl", line 28: warning: ignore unrecognized OpenCL extension
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
^
"/tmp/OCLLqoikk.cl", line 29: error: can't enable all OpenCL extensions or
unrecognized OpenCL extension
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
^
"/tmp/OCLLqoikk.cl", line 46: error: function "atom_cmpxchg" declared
implicitly
while ((readback = atom_cmpxchg((__global long*)address, oldval, newval)) != oldval)
^
2 errors detected in the compilation of "/tmp/OCLLqoikk.cl".
Internal error: clc compiler invocation failed.
ERROR: CL_BUILD_PROGRAM_FAILUREERROR: CL_INVALID_KERNEL0 1 1 1.57 0.86 0
:(
EDIT: zwątpiłem czy 7770 ma DP ale puściłem MilyWay i liczy
Cytat: RAD-Poland w 08 Sierpień 2012, 21:24
bash-4.2$ ulimit -s unlimited
bash-4.2$ ./shor_algorithm_ocl_float
0 1 6 1.98 1.19 0
:)
I gites. Tak powinno być. Jeden wynik jednak niczego mi nie mówi. Jutro podeślę skrypcik w bashu co będzie w pętli to liczył i zbierał wyniki.
Cytat: RAD-Poland w 08 Sierpień 2012, 21:24
bash-4.2$ ulimit -s unlimited
bash-4.2$ ./shor_algorithm_ocl_double
"/tmp/OCLLqoikk.cl", line 28: warning: ignore unrecognized OpenCL extension
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
^
"/tmp/OCLLqoikk.cl", line 29: error: can't enable all OpenCL extensions or
unrecognized OpenCL extension
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
^
"/tmp/OCLLqoikk.cl", line 46: error: function "atom_cmpxchg" declared
implicitly
while ((readback = atom_cmpxchg((__global long*)address, oldval, newval)) != oldval)
^
2 errors detected in the compilation of "/tmp/OCLLqoikk.cl".
Internal error: clc compiler invocation failed.
ERROR: CL_BUILD_PROGRAM_FAILUREERROR: CL_INVALID_KERNEL0 1 1 1.57 0.86 0
:(
No tak... Można było się domyślić...
W repozytorium jest rewizja 33.
Na poczatek przetestuj Shora na double. Potem zapodaj benchmarka. Zobaczymy czy teraz łyknie.
- test v33
- dodanie -I/opt/AMDAPP/include
bash-4.2$ ulimit -s unlimited
bash-4.2$ ./shor_algorithm_ocl_float
0 1 4 1.92 1.15 0
:)
bash-4.2$ ./shor_algorithm_ocl_double
"/tmp/OCLHLI142.cl", line 29: warning: ignore unrecognized OpenCL extension
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
^
"/tmp/OCLHLI142.cl", line 36: error: can't enable all OpenCL extensions or
unrecognized OpenCL extension
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
^
"/tmp/OCLHLI142.cl", line 53: error: function "atom_cmpxchg" declared
implicitly
while ((readback = atom_cmpxchg((__global long*)address, oldval, newval)) != oldval)
^
2 errors detected in the compilation of "/tmp/OCLHLI142.cl".
Internal error: clc compiler invocation failed.
ERROR: CL_BUILD_PROGRAM_FAILUREERROR: CL_INVALID_KERNEL0 1 2 1.58 0.87 0
niestety :(
EDIT benchmarka nie puszczałem -> jesteś pewien ?
A benchmark?
Cytat: RAD-Poland w 08 Sierpień 2012, 21:42
EDIT benchmarka nie puszczałem -> jesteś pewien ?
Nie :deadman:
Ale tylko w jeden sposób można się przekonać. Jednak zapodaj go jak nie będziesz niczego na PC robić :attack:
- PC zawsze coś robi, skrypty php itd
- BOINCa wyłączam CPU i GPU
- kompilacja
bash-4.2$ make benchmark_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include benchmark.cpp -o benchmark_core_ocl.o -D USE_OPENCL
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../opencl/olib_benchmark_ocl.cpp -lOpenCL
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_fp_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:41:110: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_fp_units_ocl(double*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:173:111: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_special_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:306:115: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_special_units_ocl(double*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:439:116: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_mips_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:575:106: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
g++ -Wall benchmark_core_ocl.o olib_primitives_ocl.o olib_strings_sp.o olib_benchmark_ocl.o -o benchmark_ocl -lOpenCL
- najwyżej będzie kolejny restart :cold:
EDIT: niestety
32bit initial 21sek/4,761905
32bit final 103sek/151sek/423.267122/775023178
64bit final :cold: od 20 min -> OT: idę go zrestartować bo nie będzie statów PM (już są braki w Rosettce, a zachwilę SETI i SETIbeta będzie mulił)
EDIT: pomału do przodu, widać poprawę w drugim teście FPU32
Cytat[1: FPU-32] Initial test
Calculations... Please wait...
Execution time: 21 seconds
Rate: 4,761905
[1: FPU-32] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 103 seconds
Special Units Testing...
Execution time: 151 seconds
Performance: [32-bit float types]
423.267122 GFLOPS
775023178 special operations (e.g. sin, cos) per second
Jutro stworzę skrypcik do testowania. Ile czasu GPU dasz radę wyłuskać? :attack:
postaram się trochę czasu zarezerwować 1-2h
daj znać jak będziesz miał nową wersję (będę zerkać na RSS/Forum)
skrypt by się przydał i jeśli była by możliwość w aplikacji stworzenie pliku wyjściowego, bardzo przydatny przy zawieszeniu
przechwytywanie konsoli coś mi nie wyszło, przy zawieszeniu pusty plik, a tak byłoby łatwiej i szybciej przekazać informacje, niż przy przepisywaniu
często benchmarki OC tworzą takie pliki i wiadomo w którym miejscu i z jakiego powodu komp zaliczył pada
Tak. Bedzie tworzył pliki tymczasowe. Nie powinien jednak zaliczyć zwiechy - ograniczymy się do typu float, a na nim wszystko ładnie działa :attack:
fox@APU:~/test/trunk/samples$ make benchmark_sp
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona benchmark.cpp -o benchmark_core_sp.o -D USE_SEQUENTIAL
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../sp/olib_benchmark_sp.cpp -o olib_benchmark_sp.o
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_fp_units_sp(float*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:46:9: warning: variable 'a' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp:49:9: warning: variable 'b' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_fp_units_sp(double*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:530:9: warning: variable 'a' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp:533:9: warning: variable 'b' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp:38: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:52: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:33: note: vectorized 0 loops in function.
../sp/olib_benchmark_sp.cpp:522: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:536: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:517: note: vectorized 0 loops in function.
../sp/olib_benchmark_sp.cpp:1007: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1014: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1001: note: vectorized 0 loops in function.
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_special_units_sp(float*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:1017:18: warning: 'tmp' may be used uninitialized in this function [-Wuninitialized]
../sp/olib_benchmark_sp.cpp:1037: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1044: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1031: note: vectorized 0 loops in function.
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_special_units_sp(double*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:1047:19: warning: 'tmp' may be used uninitialized in this function [-Wuninitialized]
../sp/olib_benchmark_sp.cpp:1072: note: not vectorized: multiple nested loops.
../sp/olib_benchmark_sp.cpp:1075: note: not vectorized: number of iterations cannot be computed.
../sp/olib_benchmark_sp.cpp:1075: note: not vectorized: Bad inner loop.
../sp/olib_benchmark_sp.cpp:1079: note: not vectorized: number of iterations cannot be computed.
../sp/olib_benchmark_sp.cpp:1061: note: vectorized 0 loops in function.
g++ -Wall benchmark_core_sp.o olib_benchmark_sp.o -o benchmark_sp
fox@APU:~/test/trunk/samples$ make benchmark_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona benchmark.cpp -o benchmark_core_ocl.o -D USE_OPENCL
benchmark.cpp:69:21: fatal error: CL/cl.h: No such file or directory
compilation terminated.
make: *** [benchmark_core_ocl.o] Error 1
fox@APU:~/test/trunk/samples$ ./benchmark_sp
====================================================================
OLib Benchmark.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: Sequential
Compile date: Aug 9 2012
Compile time: 09:52:49
Architecture: x86_64
====================================================================
[1: FPU-32] Initial test
Calculations... Please wait...
Execution time: 46 seconds
Rate: 2.173913
[1: FPU-32] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 101 seconds
Special Units Testing...
Execution time: 49 seconds
Performance: [32-bit float types]
0.384516 GFLOPS
2127551 special operations (e.g. sin, cos) per second
[1: FPU-64] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 118 seconds
Special Units Testing...
Execution time: 41 seconds
Performance: [64-bit float types]
0.030781 GFLOPS
237804 special operations (e.g. sin, cos) per second
[2: ALU-64] Final test
Calculations... Please wait...
Execution time: 31 seconds
Performance: [64-bit integer types]
9765.870981 MIPS
Może na początek skompiluj shor_algorithm_ocl_float
Do linijki numer 198 w pliku makefile dodaj po spacji -I/opt/AMDAPP/include
i potem normalnie
make shor_algorithm_ocl_float
Można także pobrać rewizję 34 w której problem kompilacji na AMD chyba jest poprawiony.
v34 :no:
bash-4.2$ make benchmark_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona benchmark.cpp -o benchmark_core_ocl.o -D USE_OPENCL
benchmark.cpp:69:21: fatal error: CL/cl.h: Nie ma takiego pliku ani katalogu
compilation terminated.
dla:
make shor_algorithm_ocl_double
make shor_algorithm_ocl_float
...
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../opencl/olib_quantum_ocl.cpp -lOpenCL
../opencl/olib_quantum_ocl.cpp:31:20: fatal error: CL/cl.h: Nie ma takiego pliku ani katalogu
compilation terminated.
Eh. Właśnie widzę, że make źle poprawiłem. Kaszanę niezłą zrobiłem. Potem to poprawię (za kilka godzin jak wrócę) - teraz już nie dam rady.
W repozytorium jest już plik do testów 'shor_algorithm_tester.sh' ale wkradł się tam drobny problem, którego w repo poprawionego nie ma.
Odpaliłem testy u siebie na Tesli C2050 aby wiedzieć ile mogą się one na oko wykonywać na innych kartach.
Pod wieczór napiszę dokładniej co z tego wychodzi.
A teraz komplacja przechodzi? (Rewizja 36)
v36
make shor_algorithm_ocl_double OK
make shor_algorithm_ocl_float OK
make benchmark_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona benchmark.cpp -o benchmark_core_ocl.o -D USE_OPENCL
benchmark.cpp:69:21: fatal error: CL/cl.h: Nie ma takiego pliku ani katalogu
compilation terminated.
Rew. 37.
Można sprawdzić:
benchmark_ocl
burner_ocl (może nie uruchamiać)
v37
make shor_algorithm_ocl_double OK
make shor_algorithm_ocl_float OK
make benchmark_ocl OK
make burner_ocl OK
bash-4.2$ make burner_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include burner.cpp -o burner_core_ocl.o -D USE_OPENCL
burner.cpp:226: note: not vectorized: multiple exits.
burner.cpp:118: note: vectorized 0 loops in function.
g++ -Wall burner_core_ocl.o olib_primitives_ocl.o olib_strings_sp.o olib_benchmark_ocl.o -o burner_ocl -lOpenCL
EDIT: ??
Cytatburner_ocl (może nie uruchamiać)
? - może się nie uruchamiać
? - może lepiej nie uruchamiać bo coś ...
bash-4.2$ ./burner_ocl
====================================================================
OLib Burner.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: OpenCL
Compile date: Aug 9 2012
Compile time: 18:50:51
Architecture: x86_64
====================================================================
--- 1 OpenCL platform(s) found: -----------------------------------------
-- 0 --
PROFILE = FULL_PROFILE
VERSION = OpenCL 1.2 AMD-APP (938.1)
NAME = AMD Accelerated Parallel Processing
VENDOR = Advanced Micro Devices, Inc.
EXTENSIONS = cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
--- 1 OpenCL device(s) found on platform: -----------------------------------------
------------------- 0 -------------------
DEVICE_NAME = Capeverde
DEVICE_VENDOR = Advanced Micro Devices, Inc.
DEVICE_VERSION = OpenCL 1.2 AMD-APP (938.1)
DRIVER_VERSION = CAL 1.4.1741 (VM)
CL_DEVICE_EXTENSIONS = cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt
DEVICE_MAX_COMPUTE_UNITS = 8
DEVICE_MAX_CLOCK_FREQUENCY = 1000
DEVICE_GLOBAL_MEM_SIZE = 872415232
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 16384
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 64
CL_DEVICE_LOCAL_MEM_SIZE = 32768
CL_DEVICE_MAX_MEM_ALLOC_SIZE = 524288000
CL_DEVICE_MAX_MEM_ALLOC_SIZE = 524288000
Enter the device number:
>>
EDIT2:
wykonać ./burner_ocl ?
Cytat: RAD-Poland w 09 Sierpień 2012, 18:49
? - może lepiej nie uruchamiać bo coś ...
Może nie oddać sterowania do podsystemu graficznego :attack: Nie ma po co ryzykować stabilności jak nie trzeba.
rev 39
fox@APU:~/test/trunk/samples$ make benchmark_sp
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona benchmark.cpp -o benchmark_core_sp.o -D USE_SEQUENTIAL
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../sp/olib_benchmark_sp.cpp -o olib_benchmark_sp.o
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_fp_units_sp(float*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:46:9: warning: variable 'a' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp:49:9: warning: variable 'b' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_fp_units_sp(double*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:530:9: warning: variable 'a' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp:533:9: warning: variable 'b' set but not used [-Wunused-but-set-variable]
../sp/olib_benchmark_sp.cpp:38: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:52: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:33: note: vectorized 0 loops in function.
../sp/olib_benchmark_sp.cpp:522: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:536: note: not vectorized: unsupported use in stmt.
../sp/olib_benchmark_sp.cpp:517: note: vectorized 0 loops in function.
../sp/olib_benchmark_sp.cpp:1007: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1014: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1001: note: vectorized 0 loops in function.
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_special_units_sp(float*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:1017:18: warning: 'tmp' may be used uninitialized in this function [-Wuninitialized]
../sp/olib_benchmark_sp.cpp:1037: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1044: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_benchmark_sp.cpp:1031: note: vectorized 0 loops in function.
../sp/olib_benchmark_sp.cpp: In function 'void benchmark_special_units_sp(double*, long long unsigned int)':
../sp/olib_benchmark_sp.cpp:1047:19: warning: 'tmp' may be used uninitialized in this function [-Wuninitialized]
../sp/olib_benchmark_sp.cpp:1072: note: not vectorized: multiple nested loops.
../sp/olib_benchmark_sp.cpp:1075: note: not vectorized: number of iterations cannot be computed.
../sp/olib_benchmark_sp.cpp:1075: note: not vectorized: Bad inner loop.
../sp/olib_benchmark_sp.cpp:1079: note: not vectorized: number of iterations cannot be computed.
../sp/olib_benchmark_sp.cpp:1061: note: vectorized 0 loops in function.
g++ -Wall benchmark_core_sp.o olib_benchmark_sp.o -o benchmark_sp
fox@APU:~/test/trunk/samples$ make benchmark_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include benchmark.cpp -o benchmark_core_ocl.o -D USE_OPENCL
benchmark.cpp: In function 'int main(int, char**)':
benchmark.cpp:215:26: warning: ignoring return value of 'int scanf(const char*, ...)', declared with attribute warn_unused_result [-Wunused-result]
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../primitives/olib_primitives_ocl.cpp
../primitives/olib_primitives_ocl.cpp: In function 'int support_fp64(int)':
../primitives/olib_primitives_ocl.cpp:79:84: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../primitives/olib_primitives_ocl.cpp: In function 'int search_devices(_cl_device_id**, cl_uint*)':
../primitives/olib_primitives_ocl.cpp:102:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
../primitives/olib_primitives_ocl.cpp:131:19: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
../primitives/olib_primitives_ocl.cpp: In function 'char* olib_read_opencl_kernel(char*)':
../primitives/olib_primitives_ocl.cpp:238:21: warning: variable 'readlen' set but not used [-Wunused-but-set-variable]
/usr/include/x86_64-linux-gnu/bits/stdio2.h:105: note: not vectorized: control flow in loop.
/usr/include/x86_64-linux-gnu/bits/stdio2.h:105: note: not vectorized: control flow in loop.
../primitives/olib_primitives_ocl.cpp:94: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../sp/olib_strings_sp.cpp
../sp/olib_strings_sp.cpp:71: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:71: note: not vectorized: Bad inner loop.
../sp/olib_strings_sp.cpp:74: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:48: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:48: note: not vectorized: Bad inner loop.
../sp/olib_strings_sp.cpp:52: note: not vectorized: control flow in loop.
../sp/olib_strings_sp.cpp:35: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../opencl/olib_benchmark_ocl.cpp
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_fp_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:41:110: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_fp_units_ocl(double*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:173:111: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_special_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:306:115: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_special_units_ocl(double*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:439:116: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_benchmark_ocl.cpp: In function 'int benchmark_mips_units_ocl(float*, long long unsigned int, int, long long unsigned int)':
../opencl/olib_benchmark_ocl.cpp:575:106: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
g++ -Wall benchmark_core_ocl.o olib_primitives_ocl.o olib_strings_sp.o olib_benchmark_ocl.o -o benchmark_ocl -lOpenCL
fox@APU:~/test/trunk/samples$ make burner_sp
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona burner.cpp -o burner_core_sp.o -D USE_SEQUENTIAL
/usr/include/x86_64-linux-gnu/bits/stdio2.h:105: note: not vectorized: multiple exits.
burner.cpp:118: note: vectorized 0 loops in function.
g++ -Wall burner_core_sp.o olib_benchmark_sp.o -o burner_sp
fox@APU:~/test/trunk/samples$ make burner_ocl
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include burner.cpp -o burner_core_ocl.o -D USE_OPENCL
burner.cpp: In function 'int main(int, char**)':
burner.cpp:171:26: warning: ignoring return value of 'int scanf(const char*, ...)', declared with attribute warn_unused_result [-Wunused-result]
/usr/include/x86_64-linux-gnu/bits/stdio2.h:105: note: not vectorized: multiple exits.
burner.cpp:118: note: vectorized 0 loops in function.
g++ -Wall burner_core_ocl.o olib_primitives_ocl.o olib_strings_sp.o olib_benchmark_ocl.o -o burner_ocl -lOpenCL
fox@APU:~/test/trunk/samples$ make shor_algorithm_ocl_float
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona shor_algorithm.cpp -o shor_algorithm_ocl_core_float.o -D USE_OPENCL -D FLOAT_PRECISION
shor_algorithm.cpp: In function 'int main(int, char**)':
shor_algorithm.cpp:518:53: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
shor_algorithm.cpp:633:164: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:1107:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
shor_algorithm.cpp:1222:173: warning: format '%lld' expects argument of type 'long long int', but argument 6 has type 'double' [-Wformat]
shor_algorithm.cpp:1222:173: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:1226:162: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:714: note: Detected interleaving *collapse.9_140[i_2].imag and *collapse.9_140[i_2].real
shor_algorithm.cpp:714: note: Vectorizing an unaligned access.
shor_algorithm.cpp:714: note: Vectorizing an unaligned access.
shor_algorithm.cpp:714: note: vect_model_store_cost: unaligned supported by hardware.
shor_algorithm.cpp:714: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
shor_algorithm.cpp:714: note: vect_model_load_cost: unaligned supported by hardware.
shor_algorithm.cpp:714: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
shor_algorithm.cpp:714: note: not vectorized: relevant stmt not supported: cstore.35_400 = [cond_expr] value_242 == D.25145_245 ? 1.0e+0 : 0.0;
shor_algorithm.cpp:679: note: not vectorized: control flow in loop.
shor_algorithm.cpp:518: note: not vectorized: control flow in loop.
shor_algorithm.cpp:505: note: not vectorized: control flow in loop.
shor_algorithm.cpp:452: note: not vectorized: control flow in loop.
shor_algorithm.cpp:384: note: not vectorized: control flow in loop.
shor_algorithm.cpp:214: note: vectorized 0 loops in function.
shor_algorithm.cpp:1107:3: warning: 'factor' may be used uninitialized in this function [-Wuninitialized]
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../primitives/olib_primitives.cpp
../primitives/olib_primitives.cpp:266: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../primitives/olib_primitives.cpp:261: note: vectorized 0 loops in function.
../primitives/olib_primitives.cpp:282: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../primitives/olib_primitives.cpp:285: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../primitives/olib_primitives.cpp:276: note: vectorized 0 loops in function.
../primitives/olib_primitives.cpp:305: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../primitives/olib_primitives.cpp:308: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../primitives/olib_primitives.cpp:299: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../sp/olib_linear_algebra_sp.cpp
../sp/olib_linear_algebra_sp.cpp:63: note: dependence distance = 0.
../sp/olib_linear_algebra_sp.cpp:63: note: dependence distance == 0 between D.25881_10->imag and D.25881_10->imag
../sp/olib_linear_algebra_sp.cpp:63: note: dependence distance = 0.
../sp/olib_linear_algebra_sp.cpp:63: note: accesses have the same alignment.
../sp/olib_linear_algebra_sp.cpp:63: note: dependence distance modulo vf == 0 between D.25881_10->imag and D.25881_10->imag
../sp/olib_linear_algebra_sp.cpp:63: note: Detected single element interleaving D.25881_10->imag step 8
../sp/olib_linear_algebra_sp.cpp:63: note: not vectorized: complicated access pattern.
../sp/olib_linear_algebra_sp.cpp:56: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:77: note: dependence distance = 0.
../sp/olib_linear_algebra_sp.cpp:77: note: dependence distance == 0 between D.25872_10->imag and D.25872_10->imag
../sp/olib_linear_algebra_sp.cpp:77: note: dependence distance = 0.
../sp/olib_linear_algebra_sp.cpp:77: note: accesses have the same alignment.
../sp/olib_linear_algebra_sp.cpp:77: note: dependence distance modulo vf == 0 between D.25872_10->imag and D.25872_10->imag
../sp/olib_linear_algebra_sp.cpp:77: note: Detected single element interleaving D.25872_10->imag step 16
../sp/olib_linear_algebra_sp.cpp:77: note: not vectorized: complicated access pattern.
../sp/olib_linear_algebra_sp.cpp:70: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:91: note: not vectorized: no vectype for stmt: D.25864_11 = D.25863_10->imag;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:84: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:129: note: not vectorized: inner-loop count not invariant.
../sp/olib_linear_algebra_sp.cpp:131: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:119: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:152: note: not vectorized: inner-loop count not invariant.
../sp/olib_linear_algebra_sp.cpp:154: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:142: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:175: note: not vectorized: inner-loop count not invariant.
../sp/olib_linear_algebra_sp.cpp:177: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:165: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:225: note: not vectorized: data ref analysis failed D.25790_18->real = 1.0e+0;
../sp/olib_linear_algebra_sp.cpp:216: note: Detected interleaving D.25782_9->real and D.25782_9->imag
../sp/olib_linear_algebra_sp.cpp:216: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:216: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:216: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../sp/olib_linear_algebra_sp.cpp:216: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:216: note: Cost model analysis:
Vector inside of loop cost: 2
Vector outside of loop cost: 9
Scalar iteration cost: 2
Scalar outside cost: 6
prologue iterations: 0
epilogue iterations: 1
Calculated minimum iters for profitability: 3
../sp/olib_linear_algebra_sp.cpp:216: note: Profitability threshold = 2
../sp/olib_linear_algebra_sp.cpp:216: note: Profitability threshold is 2 loop iterations.
../sp/olib_linear_algebra_sp.cpp:216: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:216: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:209: note: vectorized 1 loops in function.
../sp/olib_linear_algebra_sp.cpp:248: note: not vectorized: data ref analysis failed D.25775_18->real = 1.0e+0;
../sp/olib_linear_algebra_sp.cpp:239: note: Detected interleaving D.25767_9->real and D.25767_9->imag
../sp/olib_linear_algebra_sp.cpp:239: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:239: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:239: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../sp/olib_linear_algebra_sp.cpp:239: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:239: note: cost model: the vector iteration cost = 2 divided by the scalar iteration cost = 2 is greater or equal to the vectorization factor = 1.
../sp/olib_linear_algebra_sp.cpp:239: note: not vectorized: vectorization not profitable.
../sp/olib_linear_algebra_sp.cpp:232: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:271: note: not vectorized: data ref analysis failed D.25760_18->real = 1.0e+0;
../sp/olib_linear_algebra_sp.cpp:262: note: not vectorized: no vectype for stmt: D.25752_9->real = 0.0;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:255: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:311: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:304: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:326: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:319: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:341: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:334: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:380: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:373: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:395: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:388: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:410: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:403: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:460: note: not vectorized: multiple nested loops.
../sp/olib_linear_algebra_sp.cpp:462: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:469: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:442: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:517: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:508: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:534: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:525: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:551: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:542: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:597: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:581: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:622: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:606: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:646: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:630: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:687: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:680: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:704: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:697: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:719: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:712: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:770: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:772: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:760: note: Detected interleaving D.25501_9->real and D.25501_9->imag
../sp/olib_linear_algebra_sp.cpp:760: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:760: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:760: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../sp/olib_linear_algebra_sp.cpp:760: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:760: note: Cost model analysis:
Vector inside of loop cost: 2
Vector outside of loop cost: 9
Scalar iteration cost: 2
Scalar outside cost: 6
prologue iterations: 0
epilogue iterations: 1
Calculated minimum iters for profitability: 3
../sp/olib_linear_algebra_sp.cpp:760: note: Profitability threshold = 2
../sp/olib_linear_algebra_sp.cpp:760: note: Profitability threshold is 2 loop iterations.
../sp/olib_linear_algebra_sp.cpp:760: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:760: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:753: note: vectorized 1 loops in function.
../sp/olib_linear_algebra_sp.cpp:802: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:804: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:792: note: Detected interleaving D.25479_9->real and D.25479_9->imag
../sp/olib_linear_algebra_sp.cpp:792: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:792: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:792: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../sp/olib_linear_algebra_sp.cpp:792: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:792: note: cost model: the vector iteration cost = 2 divided by the scalar iteration cost = 2 is greater or equal to the vectorization factor = 1.
../sp/olib_linear_algebra_sp.cpp:792: note: not vectorized: vectorization not profitable.
../sp/olib_linear_algebra_sp.cpp:785: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:833: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:835: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:823: note: not vectorized: no vectype for stmt: D.25457_9->real = 0.0;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:816: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:880: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:882: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:873: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:900: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:902: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:893: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:920: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:922: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:913: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance = 0.
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance == 0 between D.25381_28->real and D.25381_28->real
../sp/olib_linear_algebra_sp.cpp:949: note: Detected interleaving D.25381_28->real and D.25381_28->imag
../sp/olib_linear_algebra_sp.cpp:949: note: Detected interleaving D.25381_28->real and D.25381_28->imag
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance = 0.
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance == 0 between D.25381_28->imag and D.25381_28->imag
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance = 0.
../sp/olib_linear_algebra_sp.cpp:949: note: accesses have the same alignment.
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance modulo vf == 0 between D.25381_28->real and D.25381_28->real
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance = 0.
../sp/olib_linear_algebra_sp.cpp:949: note: accesses have the same alignment.
../sp/olib_linear_algebra_sp.cpp:949: note: dependence distance modulo vf == 0 between D.25381_28->imag and D.25381_28->imag
../sp/olib_linear_algebra_sp.cpp:949: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:949: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:949: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:949: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:949: note: Cost model analysis:
Vector inside of loop cost: 7
Vector outside of loop cost: 16
Scalar iteration cost: 10
Scalar outside cost: 6
prologue iterations: 0
epilogue iterations: 1
Calculated minimum iters for profitability: 2
../sp/olib_linear_algebra_sp.cpp:949: note: Profitability threshold = 1
../sp/olib_linear_algebra_sp.cpp:949: note: Profitability threshold is 1 loop iterations.
../sp/olib_linear_algebra_sp.cpp:949: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:949: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:942: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:935: note: vectorized 1 loops in function.
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance = 0.
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance == 0 between D.25363_26->real and D.25363_26->real
../sp/olib_linear_algebra_sp.cpp:971: note: Detected interleaving D.25363_26->real and D.25363_26->imag
../sp/olib_linear_algebra_sp.cpp:971: note: Detected interleaving D.25363_26->real and D.25363_26->imag
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance = 0.
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance == 0 between D.25363_26->imag and D.25363_26->imag
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance = 0.
../sp/olib_linear_algebra_sp.cpp:971: note: accesses have the same alignment.
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance modulo vf == 0 between D.25363_26->real and D.25363_26->real
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance = 0.
../sp/olib_linear_algebra_sp.cpp:971: note: accesses have the same alignment.
../sp/olib_linear_algebra_sp.cpp:971: note: dependence distance modulo vf == 0 between D.25363_26->imag and D.25363_26->imag
../sp/olib_linear_algebra_sp.cpp:971: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:971: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:971: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:971: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:971: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:971: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:971: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:971: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:971: note: Cost model analysis:
Vector inside of loop cost: 5
Vector outside of loop cost: 6
Scalar iteration cost: 6
Scalar outside cost: 6
prologue iterations: 0
epilogue iterations: 0
Calculated minimum iters for profitability: 1
../sp/olib_linear_algebra_sp.cpp:971: note: Profitability threshold = 0
../sp/olib_linear_algebra_sp.cpp:971: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:971: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:964: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:957: note: vectorized 1 loops in function.
../sp/olib_linear_algebra_sp.cpp:993: note: not vectorized: no vectype for stmt: D.25341_32 = D.25340_31->real;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:986: note: not vectorized: loop contains function calls or data references that cannot be analyzed
../sp/olib_linear_algebra_sp.cpp:979: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:1008: note: versioning for alias required: can't determine dependence between D.25331_10->real and D.25330_7->real
../sp/olib_linear_algebra_sp.cpp:1008: note: mark for run-time aliasing test between D.25331_10->real and D.25330_7->real
../sp/olib_linear_algebra_sp.cpp:1008: note: Detected interleaving D.25331_10->real and D.25331_10->imag
../sp/olib_linear_algebra_sp.cpp:1008: note: versioning for alias required: can't determine dependence between D.25331_10->real and D.25330_7->imag
../sp/olib_linear_algebra_sp.cpp:1008: note: mark for run-time aliasing test between D.25331_10->real and D.25330_7->imag
../sp/olib_linear_algebra_sp.cpp:1008: note: Detected interleaving D.25330_7->real and D.25330_7->imag
../sp/olib_linear_algebra_sp.cpp:1008: note: versioning for alias required: can't determine dependence between D.25331_10->imag and D.25330_7->imag
../sp/olib_linear_algebra_sp.cpp:1008: note: mark for run-time aliasing test between D.25331_10->imag and D.25330_7->imag
../sp/olib_linear_algebra_sp.cpp:1008: note: found equal ranges D.25331_10->real, D.25330_7->imag and D.25331_10->real, D.25330_7->real
../sp/olib_linear_algebra_sp.cpp:1008: note: found equal ranges D.25331_10->imag, D.25330_7->imag and D.25331_10->real, D.25330_7->real
../sp/olib_linear_algebra_sp.cpp:1008: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1008: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1008: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1008: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1008: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1008: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1008: note: cost model: Adding cost of checks for loop versioning aliasing.
../sp/olib_linear_algebra_sp.cpp:1008: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:1008: note: Cost model analysis:
Vector inside of loop cost: 4
Vector outside of loop cost: 14
Scalar iteration cost: 4
Scalar outside cost: 1
prologue iterations: 0
epilogue iterations: 1
Calculated minimum iters for profitability: 6
../sp/olib_linear_algebra_sp.cpp:1008: note: Profitability threshold = 5
../sp/olib_linear_algebra_sp.cpp:1008: note: Profitability threshold is 5 loop iterations.
../sp/olib_linear_algebra_sp.cpp:1008: note: create runtime check for data references D.25331_10->real and D.25330_7->real
../sp/olib_linear_algebra_sp.cpp:1008: note: created 1 versioning for alias checks.
../sp/olib_linear_algebra_sp.cpp:1008: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:1008: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:1003: note: vectorized 1 loops in function.
../sp/olib_linear_algebra_sp.cpp:1021: note: versioning for alias required: can't determine dependence between D.25321_10->real and D.25320_7->real
../sp/olib_linear_algebra_sp.cpp:1021: note: mark for run-time aliasing test between D.25321_10->real and D.25320_7->real
../sp/olib_linear_algebra_sp.cpp:1021: note: Detected interleaving D.25321_10->real and D.25321_10->imag
../sp/olib_linear_algebra_sp.cpp:1021: note: versioning for alias required: can't determine dependence between D.25321_10->real and D.25320_7->imag
../sp/olib_linear_algebra_sp.cpp:1021: note: mark for run-time aliasing test between D.25321_10->real and D.25320_7->imag
../sp/olib_linear_algebra_sp.cpp:1021: note: Detected interleaving D.25320_7->real and D.25320_7->imag
../sp/olib_linear_algebra_sp.cpp:1021: note: versioning for alias required: can't determine dependence between D.25321_10->imag and D.25320_7->imag
../sp/olib_linear_algebra_sp.cpp:1021: note: mark for run-time aliasing test between D.25321_10->imag and D.25320_7->imag
../sp/olib_linear_algebra_sp.cpp:1021: note: found equal ranges D.25321_10->real, D.25320_7->imag and D.25321_10->real, D.25320_7->real
../sp/olib_linear_algebra_sp.cpp:1021: note: found equal ranges D.25321_10->imag, D.25320_7->imag and D.25321_10->real, D.25320_7->real
../sp/olib_linear_algebra_sp.cpp:1021: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1021: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1021: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1021: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1021: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1021: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1021: note: cost model: Adding cost of checks for loop versioning aliasing.
../sp/olib_linear_algebra_sp.cpp:1021: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:1021: note: cost model: the vector iteration cost = 4 divided by the scalar iteration cost = 4 is greater or equal to the vectorization factor = 1.
../sp/olib_linear_algebra_sp.cpp:1021: note: not vectorized: vectorization not profitable.
../sp/olib_linear_algebra_sp.cpp:1016: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:1034: note: not vectorized: no vectype for stmt: D.25312_11 = D.25311_10->real;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:1029: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:1047: note: versioning for alias required: can't determine dependence between D.25299_10->real and D.25297_7->real
../sp/olib_linear_algebra_sp.cpp:1047: note: mark for run-time aliasing test between D.25299_10->real and D.25297_7->real
../sp/olib_linear_algebra_sp.cpp:1047: note: Detected interleaving D.25299_10->real and D.25299_10->imag
../sp/olib_linear_algebra_sp.cpp:1047: note: versioning for alias required: can't determine dependence between D.25299_10->real and D.25297_7->imag
../sp/olib_linear_algebra_sp.cpp:1047: note: mark for run-time aliasing test between D.25299_10->real and D.25297_7->imag
../sp/olib_linear_algebra_sp.cpp:1047: note: Detected interleaving D.25297_7->real and D.25297_7->imag
../sp/olib_linear_algebra_sp.cpp:1047: note: versioning for alias required: can't determine dependence between D.25299_10->imag and D.25297_7->imag
../sp/olib_linear_algebra_sp.cpp:1047: note: mark for run-time aliasing test between D.25299_10->imag and D.25297_7->imag
../sp/olib_linear_algebra_sp.cpp:1047: note: found equal ranges D.25299_10->real, D.25297_7->imag and D.25299_10->real, D.25297_7->real
../sp/olib_linear_algebra_sp.cpp:1047: note: found equal ranges D.25299_10->imag, D.25297_7->imag and D.25299_10->real, D.25297_7->real
../sp/olib_linear_algebra_sp.cpp:1047: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1047: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1047: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1047: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1047: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1047: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1047: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1047: note: cost model: Adding cost of checks for loop versioning aliasing.
../sp/olib_linear_algebra_sp.cpp:1047: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:1047: note: Cost model analysis:
Vector inside of loop cost: 5
Vector outside of loop cost: 16
Scalar iteration cost: 6
Scalar outside cost: 1
prologue iterations: 0
epilogue iterations: 1
Calculated minimum iters for profitability: 4
../sp/olib_linear_algebra_sp.cpp:1047: note: Profitability threshold = 3
../sp/olib_linear_algebra_sp.cpp:1047: note: Profitability threshold is 3 loop iterations.
../sp/olib_linear_algebra_sp.cpp:1047: note: create runtime check for data references D.25299_10->real and D.25297_7->real
../sp/olib_linear_algebra_sp.cpp:1047: note: created 1 versioning for alias checks.
../sp/olib_linear_algebra_sp.cpp:1047: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:1047: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:1042: note: vectorized 1 loops in function.
../sp/olib_linear_algebra_sp.cpp:1060: note: not vectorized: no vectype for stmt: D.25287_11 = D.25286_10->real;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:1055: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:1074: note: versioning for alias required: can't determine dependence between D.25273_10->real and D.25271_7->real
../sp/olib_linear_algebra_sp.cpp:1074: note: mark for run-time aliasing test between D.25273_10->real and D.25271_7->real
../sp/olib_linear_algebra_sp.cpp:1074: note: Detected interleaving D.25273_10->real and D.25273_10->imag
../sp/olib_linear_algebra_sp.cpp:1074: note: versioning for alias required: can't determine dependence between D.25273_10->real and D.25271_7->imag
../sp/olib_linear_algebra_sp.cpp:1074: note: mark for run-time aliasing test between D.25273_10->real and D.25271_7->imag
../sp/olib_linear_algebra_sp.cpp:1074: note: Detected interleaving D.25271_7->real and D.25271_7->imag
../sp/olib_linear_algebra_sp.cpp:1074: note: versioning for alias required: can't determine dependence between D.25273_10->imag and D.25271_7->imag
../sp/olib_linear_algebra_sp.cpp:1074: note: mark for run-time aliasing test between D.25273_10->imag and D.25271_7->imag
../sp/olib_linear_algebra_sp.cpp:1074: note: found equal ranges D.25273_10->real, D.25271_7->imag and D.25273_10->real, D.25271_7->real
../sp/olib_linear_algebra_sp.cpp:1074: note: found equal ranges D.25273_10->imag, D.25271_7->imag and D.25273_10->real, D.25271_7->real
../sp/olib_linear_algebra_sp.cpp:1074: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1074: note: Vectorizing an unaligned access.
../sp/olib_linear_algebra_sp.cpp:1074: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1074: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1074: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1074: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_linear_algebra_sp.cpp:1074: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_linear_algebra_sp.cpp:1074: note: cost model: Adding cost of checks for loop versioning aliasing.
../sp/olib_linear_algebra_sp.cpp:1074: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_linear_algebra_sp.cpp:1074: note: Cost model analysis:
Vector inside of loop cost: 5
Vector outside of loop cost: 16
Scalar iteration cost: 6
Scalar outside cost: 1
prologue iterations: 0
epilogue iterations: 1
Calculated minimum iters for profitability: 4
../sp/olib_linear_algebra_sp.cpp:1074: note: Profitability threshold = 3
../sp/olib_linear_algebra_sp.cpp:1074: note: Profitability threshold is 3 loop iterations.
../sp/olib_linear_algebra_sp.cpp:1074: note: create runtime check for data references D.25273_10->real and D.25271_7->real
../sp/olib_linear_algebra_sp.cpp:1074: note: created 1 versioning for alias checks.
../sp/olib_linear_algebra_sp.cpp:1074: note: vectorizing stmts using SLP.
../sp/olib_linear_algebra_sp.cpp:1074: note: LOOP VECTORIZED.
../sp/olib_linear_algebra_sp.cpp:1069: note: vectorized 1 loops in function.
../sp/olib_linear_algebra_sp.cpp:1088: note: not vectorized: no vectype for stmt: D.25261_11 = D.25260_10->real;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:1083: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:1101: note: not vectorized: no vectype for stmt: D.25245_7->real = D.25249_12;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:1096: note: vectorized 0 loops in function.
../sp/olib_linear_algebra_sp.cpp:1114: note: not vectorized: no vectype for stmt: D.25232_7->real = D.25236_12;
scalar_type: long double
../sp/olib_linear_algebra_sp.cpp:1109: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../sp/olib_quantum_sp.cpp
../sp/olib_quantum_sp.cpp:231: note: not vectorized: number of iterations cannot be computed.
../sp/olib_quantum_sp.cpp:227: note: vectorized 0 loops in function.
../sp/olib_quantum_sp.cpp:246: note: not vectorized: unexpected loop form.
../sp/olib_quantum_sp.cpp:243: note: vectorized 0 loops in function.
../sp/olib_quantum_sp.cpp:270: note: not vectorized: number of iterations cannot be computed.
../sp/olib_quantum_sp.cpp:257: note: vectorized 0 loops in function.
../sp/olib_quantum_sp.cpp:289: note: not vectorized: unexpected loop form.
../sp/olib_quantum_sp.cpp:289: note: not vectorized: Bad inner loop.
../sp/olib_quantum_sp.cpp:297: note: not vectorized: unexpected loop form.
../sp/olib_quantum_sp.cpp:282: note: vectorized 0 loops in function.
../sp/olib_quantum_sp.cpp:325: note: versioning for alias required: can't determine dependence between D.24987_10->real and D.24986_7->real
../sp/olib_quantum_sp.cpp:325: note: mark for run-time aliasing test between D.24987_10->real and D.24986_7->real
../sp/olib_quantum_sp.cpp:325: note: Detected interleaving D.24987_10->real and D.24987_10->imag
../sp/olib_quantum_sp.cpp:325: note: versioning for alias required: can't determine dependence between D.24987_10->real and D.24986_7->imag
../sp/olib_quantum_sp.cpp:325: note: mark for run-time aliasing test between D.24987_10->real and D.24986_7->imag
../sp/olib_quantum_sp.cpp:325: note: Detected interleaving D.24986_7->real and D.24986_7->imag
../sp/olib_quantum_sp.cpp:325: note: versioning for alias required: can't determine dependence between D.24987_10->imag and D.24986_7->imag
../sp/olib_quantum_sp.cpp:325: note: mark for run-time aliasing test between D.24987_10->imag and D.24986_7->imag
../sp/olib_quantum_sp.cpp:325: note: found equal ranges D.24987_10->real, D.24986_7->imag and D.24987_10->real, D.24986_7->real
../sp/olib_quantum_sp.cpp:325: note: found equal ranges D.24987_10->imag, D.24986_7->imag and D.24987_10->real, D.24986_7->real
../sp/olib_quantum_sp.cpp:325: note: Vectorizing an unaligned access.
../sp/olib_quantum_sp.cpp:325: note: Vectorizing an unaligned access.
../sp/olib_quantum_sp.cpp:325: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_quantum_sp.cpp:325: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_quantum_sp.cpp:325: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_quantum_sp.cpp:325: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_quantum_sp.cpp:325: note: cost model: Adding cost of checks for loop versioning aliasing.
../sp/olib_quantum_sp.cpp:325: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_quantum_sp.cpp:325: note: cost model: the vector iteration cost = 4 divided by the scalar iteration cost = 4 is greater or equal to the vectorization factor = 1.
../sp/olib_quantum_sp.cpp:325: note: not vectorized: vectorization not profitable.
../sp/olib_quantum_sp.cpp:321: note: vectorized 0 loops in function.
../sp/olib_quantum_sp.cpp:188: note: not vectorized: control flow in loop.
../sp/olib_quantum_sp.cpp:188: note: not vectorized: Bad inner loop.
../sp/olib_quantum_sp.cpp:192: note: not vectorized: control flow in loop.
../sp/olib_quantum_sp.cpp:146: note: Detected interleaving *init.6_23[i_96].real and *init.6_23[i_96].imag
../sp/olib_quantum_sp.cpp:146: note: Vectorizing an unaligned access.
../sp/olib_quantum_sp.cpp:146: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_quantum_sp.cpp:146: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../sp/olib_quantum_sp.cpp:146: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_quantum_sp.cpp:146: note: cost model: the vector iteration cost = 2 divided by the scalar iteration cost = 2 is greater or equal to the vectorization factor = 1.
../sp/olib_quantum_sp.cpp:146: note: not vectorized: vectorization not profitable.
../sp/olib_quantum_sp.cpp:133: note: vectorized 0 loops in function.
../sp/olib_quantum_sp.cpp:337: note: versioning for alias required: can't determine dependence between D.24977_10->real and D.24976_7->real
../sp/olib_quantum_sp.cpp:337: note: mark for run-time aliasing test between D.24977_10->real and D.24976_7->real
../sp/olib_quantum_sp.cpp:337: note: Detected interleaving D.24977_10->real and D.24977_10->imag
../sp/olib_quantum_sp.cpp:337: note: versioning for alias required: can't determine dependence between D.24977_10->real and D.24976_7->imag
../sp/olib_quantum_sp.cpp:337: note: mark for run-time aliasing test between D.24977_10->real and D.24976_7->imag
../sp/olib_quantum_sp.cpp:337: note: Detected interleaving D.24976_7->real and D.24976_7->imag
../sp/olib_quantum_sp.cpp:337: note: versioning for alias required: can't determine dependence between D.24977_10->imag and D.24976_7->imag
../sp/olib_quantum_sp.cpp:337: note: mark for run-time aliasing test between D.24977_10->imag and D.24976_7->imag
../sp/olib_quantum_sp.cpp:337: note: found equal ranges D.24977_10->real, D.24976_7->imag and D.24977_10->real, D.24976_7->real
../sp/olib_quantum_sp.cpp:337: note: found equal ranges D.24977_10->imag, D.24976_7->imag and D.24977_10->real, D.24976_7->real
../sp/olib_quantum_sp.cpp:337: note: Vectorizing an unaligned access.
../sp/olib_quantum_sp.cpp:337: note: Vectorizing an unaligned access.
../sp/olib_quantum_sp.cpp:337: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_quantum_sp.cpp:337: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_quantum_sp.cpp:337: note: vect_model_load_cost: unaligned supported by hardware.
../sp/olib_quantum_sp.cpp:337: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
../sp/olib_quantum_sp.cpp:337: note: cost model: Adding cost of checks for loop versioning aliasing.
../sp/olib_quantum_sp.cpp:337: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_quantum_sp.cpp:337: note: Cost model analysis:
Vector inside of loop cost: 4
Vector outside of loop cost: 14
Scalar iteration cost: 4
Scalar outside cost: 1
prologue iterations: 0
epilogue iterations: 1
Calculated minimum iters for profitability: 6
../sp/olib_quantum_sp.cpp:337: note: Profitability threshold = 5
../sp/olib_quantum_sp.cpp:337: note: Profitability threshold is 5 loop iterations.
../sp/olib_quantum_sp.cpp:337: note: create runtime check for data references D.24977_10->real and D.24976_7->real
../sp/olib_quantum_sp.cpp:337: note: created 1 versioning for alias checks.
../sp/olib_quantum_sp.cpp:337: note: vectorizing stmts using SLP.
../sp/olib_quantum_sp.cpp:337: note: LOOP VECTORIZED.
../sp/olib_quantum_sp.cpp:333: note: vectorized 1 loops in function.
../sp/olib_quantum_sp.cpp:95: note: not vectorized: control flow in loop.
../sp/olib_quantum_sp.cpp:95: note: not vectorized: Bad inner loop.
../sp/olib_quantum_sp.cpp:99: note: not vectorized: control flow in loop.
../sp/olib_quantum_sp.cpp:52: note: Detected interleaving *init.14_23[i_106].real and *init.14_23[i_106].imag
../sp/olib_quantum_sp.cpp:52: note: Vectorizing an unaligned access.
../sp/olib_quantum_sp.cpp:52: note: vect_model_store_cost: unaligned supported by hardware.
../sp/olib_quantum_sp.cpp:52: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../sp/olib_quantum_sp.cpp:52: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../sp/olib_quantum_sp.cpp:52: note: Cost model analysis:
Vector inside of loop cost: 2
Vector outside of loop cost: 9
Scalar iteration cost: 2
Scalar outside cost: 6
prologue iterations: 0
epilogue iterations: 1
Calculated minimum iters for profitability: 3
../sp/olib_quantum_sp.cpp:52: note: Profitability threshold = 2
../sp/olib_quantum_sp.cpp:52: note: Profitability threshold is 2 loop iterations.
../sp/olib_quantum_sp.cpp:52: note: vectorizing stmts using SLP.
../sp/olib_quantum_sp.cpp:52: note: LOOP VECTORIZED.
../sp/olib_quantum_sp.cpp:39: note: vectorized 1 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona -I/opt/AMDAPP/include ../opencl/olib_quantum_ocl.cpp
../opencl/olib_quantum_ocl.cpp: In function 'int dqft_ocl(complex_float*, long long unsigned int, int, int)':
../opencl/olib_quantum_ocl.cpp:62:97: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_quantum_ocl.cpp: In function 'int dqft_ocl(complex_double*, long long unsigned int, int, int)':
../opencl/olib_quantum_ocl.cpp:254:98: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
../opencl/olib_quantum_ocl.cpp:77: note: not vectorized: control flow in loop.
../opencl/olib_quantum_ocl.cpp:55: note: Detected interleaving *init.21_23[i_60].real and *init.21_23[i_60].imag
../opencl/olib_quantum_ocl.cpp:55: note: Vectorizing an unaligned access.
../opencl/olib_quantum_ocl.cpp:55: note: vect_model_store_cost: unaligned supported by hardware.
../opencl/olib_quantum_ocl.cpp:55: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../opencl/olib_quantum_ocl.cpp:55: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../opencl/olib_quantum_ocl.cpp:55: note: Cost model analysis:
Vector inside of loop cost: 2
Vector outside of loop cost: 9
Scalar iteration cost: 2
Scalar outside cost: 6
prologue iterations: 0
epilogue iterations: 1
Calculated minimum iters for profitability: 3
../opencl/olib_quantum_ocl.cpp:55: note: Profitability threshold = 2
../opencl/olib_quantum_ocl.cpp:55: note: Profitability threshold is 2 loop iterations.
../opencl/olib_quantum_ocl.cpp:55: note: vectorizing stmts using SLP.
../opencl/olib_quantum_ocl.cpp:55: note: LOOP VECTORIZED.
../opencl/olib_quantum_ocl.cpp:41: note: vectorized 1 loops in function.
../opencl/olib_quantum_ocl.cpp:269: note: not vectorized: control flow in loop.
../opencl/olib_quantum_ocl.cpp:244: note: Detected interleaving *init.2_23[i_48].real and *init.2_23[i_48].imag
../opencl/olib_quantum_ocl.cpp:244: note: Vectorizing an unaligned access.
../opencl/olib_quantum_ocl.cpp:244: note: vect_model_store_cost: unaligned supported by hardware.
../opencl/olib_quantum_ocl.cpp:244: note: vect_model_store_cost: inside_cost = 2, outside_cost = 1 .
../opencl/olib_quantum_ocl.cpp:244: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown .
../opencl/olib_quantum_ocl.cpp:244: note: cost model: the vector iteration cost = 2 divided by the scalar iteration cost = 2 is greater or equal to the vectorization factor = 1.
../opencl/olib_quantum_ocl.cpp:244: note: not vectorized: vectorization not profitable.
../opencl/olib_quantum_ocl.cpp:229: note: vectorized 0 loops in function.
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona ../sp/olib_discrete_sp.cpp
../sp/olib_discrete_sp.cpp:64: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:57: note: vectorized 0 loops in function.
../sp/olib_discrete_sp.cpp:106: note: not vectorized: number of iterations cannot be computed.
../sp/olib_discrete_sp.cpp:101: note: vectorized 0 loops in function.
../sp/olib_discrete_sp.cpp:106: note: not vectorized: number of iterations cannot be computed.
../sp/olib_discrete_sp.cpp:139: note: vectorized 0 loops in function.
../sp/olib_discrete_sp.cpp:181: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:168: note: vectorized 0 loops in function.
../sp/olib_discrete_sp.cpp:220: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:217: note: vectorized 0 loops in function.
../sp/olib_discrete_sp.cpp:293: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:252: note: vectorized 0 loops in function.
../sp/olib_discrete_sp.cpp:319: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:310: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:305: note: vectorized 0 loops in function.
../sp/olib_discrete_sp.cpp:359: note: not vectorized: number of iterations cannot be computed.
../sp/olib_discrete_sp.cpp:352: note: vectorized 0 loops in function.
../sp/olib_discrete_sp.cpp:385: note: not vectorized: number of iterations cannot be computed.
../sp/olib_discrete_sp.cpp:378: note: vectorized 0 loops in function.
../sp/olib_discrete_sp.cpp:411: note: not vectorized: number of iterations cannot be computed.
../sp/olib_discrete_sp.cpp:404: note: vectorized 0 loops in function.
../sp/olib_discrete_sp.cpp:436: note: not vectorized: number of iterations cannot be computed.
../sp/olib_discrete_sp.cpp:429: note: vectorized 0 loops in function.
../sp/olib_discrete_sp.cpp:483: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:474: note: vectorized 0 loops in function.
../sp/olib_discrete_sp.cpp:508: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:499: note: vectorized 0 loops in function.
../sp/olib_discrete_sp.cpp:533: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:524: note: vectorized 0 loops in function.
../sp/olib_discrete_sp.cpp:558: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:549: note: vectorized 0 loops in function.
../sp/olib_discrete_sp.cpp:587: note: not vectorized: control flow in loop.
../sp/olib_discrete_sp.cpp:578: note: vectorized 0 loops in function.
g++ -Wall shor_algorithm_ocl_core_float.o olib_primitives.o olib_linear_algebra_sp.o olib_quantum_sp.o olib_quantum_ocl.o olib_discrete_sp.o olib_primitives_ocl.o olib_strings_sp.o -o shor_algorithm_ocl_float -lOpenCL
fox@APU:~/test/trunk/samples$ make shor_algorithm_ocl_double
g++ -Wall -c -O2 -msse2 -ftree-vectorize -ftree-vectorizer-verbose=6 -march=nocona shor_algorithm.cpp -o shor_algorithm_ocl_core_double.o -D USE_OPENCL -D DOUBLE_PRECISION
shor_algorithm.cpp: In function 'int main(int, char**)':
shor_algorithm.cpp:518:53: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
shor_algorithm.cpp:633:164: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:1107:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
shor_algorithm.cpp:1222:173: warning: format '%lld' expects argument of type 'long long int', but argument 6 has type 'double' [-Wformat]
shor_algorithm.cpp:1222:173: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:1226:162: warning: too many arguments for format [-Wformat-extra-args]
shor_algorithm.cpp:714: note: Detected interleaving *collapse.9_140[i_2].imag and *collapse.9_140[i_2].real
shor_algorithm.cpp:714: note: Vectorizing an unaligned access.
shor_algorithm.cpp:714: note: Vectorizing an unaligned access.
shor_algorithm.cpp:714: note: vect_model_store_cost: unaligned supported by hardware.
shor_algorithm.cpp:714: note: vect_model_store_cost: inside_cost = 2, outside_cost = 0 .
shor_algorithm.cpp:714: note: vect_model_load_cost: unaligned supported by hardware.
shor_algorithm.cpp:714: note: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
shor_algorithm.cpp:714: note: not vectorized: relevant stmt not supported: cstore.33_397 = [cond_expr] value_242 == D.25144_245 ? 1.0e+0 : 0.0;
shor_algorithm.cpp:679: note: not vectorized: control flow in loop.
shor_algorithm.cpp:518: note: not vectorized: control flow in loop.
shor_algorithm.cpp:505: note: not vectorized: control flow in loop.
shor_algorithm.cpp:452: note: not vectorized: control flow in loop.
shor_algorithm.cpp:384: note: not vectorized: control flow in loop.
shor_algorithm.cpp:214: note: vectorized 0 loops in function.
shor_algorithm.cpp:1107:3: warning: 'factor' may be used uninitialized in this function [-Wuninitialized]
g++ -Wall shor_algorithm_ocl_core_double.o olib_primitives.o olib_linear_algebra_sp.o olib_quantum_sp.o olib_quantum_ocl.o olib_discrete_sp.o olib_primitives_ocl.o olib_strings_sp.o -o shor_algorithm_ocl_double -lOpenCL
fox@APU:~/test/trunk/samples$ ./benchmark_sp
====================================================================
OLib Benchmark.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: Sequential
Compile date: Aug 9 2012
Compile time: 22:35:58
Architecture: x86_64
====================================================================
[1: FPU-32] Initial test
Calculations... Please wait...
Execution time: 47 seconds
Rate: 2.127660
[1: FPU-32] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 99 seconds
Special Units Testing...
Execution time: 49 seconds
Performance: [32-bit float types]
0.383818 GFLOPS
2081632 special operations (e.g. sin, cos) per second
[1: FPU-64] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 115 seconds
Special Units Testing...
Execution time: 42 seconds
Performance: [64-bit float types]
0.031584 GFLOPS
232142 special operations (e.g. sin, cos) per second
[2: ALU-64] Final test
Calculations... Please wait...
Execution time: 30 seconds
Performance: [64-bit integer types]
9873.600013 MIPS
fox@APU:~/test/trunk/samples$ ./benchmark_ocl
====================================================================
OLib Benchmark.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: OpenCL
Compile date: Aug 9 2012
Compile time: 22:36:28
Architecture: x86_64
====================================================================
--- 1 OpenCL platform(s) found: -----------------------------------------
-- 0 --
PROFILE = FULL_PROFILE
VERSION = OpenCL 1.2 AMD-APP (923.1)
NAME = AMD Accelerated Parallel Processing
VENDOR = Advanced Micro Devices, Inc.
EXTENSIONS = cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
--- 1 OpenCL device(s) found on platform: -----------------------------------------
------------------- 0 -------------------
DEVICE_NAME = BeaverCreek
DEVICE_VENDOR = Advanced Micro Devices, Inc.
DEVICE_VERSION = OpenCL 1.2 AMD-APP (923.1)
DRIVER_VERSION = CAL 1.4.1720
CL_DEVICE_EXTENSIONS = cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt
DEVICE_MAX_COMPUTE_UNITS = 5
DEVICE_MAX_CLOCK_FREQUENCY = 900
DEVICE_GLOBAL_MEM_SIZE = 268435456
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 0
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 0
CL_DEVICE_LOCAL_MEM_SIZE = 32768
CL_DEVICE_MAX_MEM_ALLOC_SIZE = 134217728
CL_DEVICE_MAX_MEM_ALLOC_SIZE = 134217728
Enter the device number:
>>
wyniki 1000 iteracji z 5250 czas ~2h10
./shor_algorithm_tester.sh shor_algorithm_ocl_float Radeon7770
w załączniku temporary.zip
general_log.shor_algorithm_ocl_float.Radeon7770
result.shor_algorithm_ocl_float.Radeon7770.45
result.shor_algorithm_ocl_float.Radeon7770.111
result.shor_algorithm_ocl_float.Radeon7770.213
result.shor_algorithm_ocl_float.Radeon7770.315
HD770 trochę za słaba do wykonania pełnego zadania :(
przy kolejnym teście ujawnił się :cold: problem (a noc była zimna :P)
dzięki tworzonym plikom tymczasowym można wywnioskować,ze problemem nie był X-serwer
problem prawdopodobnie w sterownikach lub skrypcie nie wykluczam też hardware (leciwego)
skrypt wykonał 240 z 250 iteracji kolejnej liczby 411 w ~26min czyli dość szybko po czym zatrzymał się :(
podczas testu nie zaobserwowałem przegrzewania się GPU ~55C a wentylatory przyspieszyły z 20-30% (XFXHD7770 Double Dissipation) ma czym chłodzić, do tego chłodne powietrze z zewnątrz
procesy uruchomione w tle pracowały bez problemu jeszcze 8h łącznie z BOINC Rosetta@home uruchomioną na 1 rdzeniu
może to problem tylko u mnie poczekajmy na wyniki pozostałych testerów
choć do tej pory generalnie nie było problemu z GPU przy projektach które testowałem, tylko przy liczeniu PrimeGrid przycinał
~ v46
bash-4.2$ ./benchmark_ocl
====================================================================
OLib Benchmark.
Written by Lukasz Swierczewski <luk.swierczewski@gmail.com>.
Version: OpenCL
Compile date: Aug 10 2012
Compile time: 16:39:11
Architecture: x86_64
====================================================================
--- 1 OpenCL platform(s) found: -----------------------------------------
-- 0 --
PROFILE = FULL_PROFILE
VERSION = OpenCL 1.2 AMD-APP (938.1)
NAME = AMD Accelerated Parallel Processing
VENDOR = Advanced Micro Devices, Inc.
EXTENSIONS = cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
--- 1 OpenCL device(s) found on platform: -----------------------------------------
------------------- 0 -------------------
DEVICE_NAME = Capeverde
DEVICE_VENDOR = Advanced Micro Devices, Inc.
DEVICE_VERSION = OpenCL 1.2 AMD-APP (938.1)
DRIVER_VERSION = CAL 1.4.1741 (VM)
CL_DEVICE_EXTENSIONS = cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt
DEVICE_MAX_COMPUTE_UNITS = 8
DEVICE_MAX_CLOCK_FREQUENCY = 1000
DEVICE_GLOBAL_MEM_SIZE = 874512384
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 16384
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 64
CL_DEVICE_LOCAL_MEM_SIZE = 32768
CL_DEVICE_MAX_MEM_ALLOC_SIZE = 524288000
CL_DEVICE_MAX_MEM_ALLOC_SIZE = 524288000
Enter the device number:
>> 0
[1: FPU-32] Initial test
Calculations... Please wait...
Execution time: 23 seconds
Rate: 4.347826
[1: FPU-32] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 96 seconds
Special Units Testing...
Execution time: 141 seconds
Performance: [32-bit float types]
414.639362 GFLOPS
757813829 special operations (e.g. sin, cos) per second
[1: FPU-64] Final test
Calculations... Please wait...
FP Units Testing...
Execution time: 405 seconds
Special Units Testing...
Execution time: 109 seconds
Performance: [64-bit float types]
4.913934 GFLOPS
49011467 special operations (e.g. sin, cos) per second
[2: ALU-64] Final test
Calculations... Please wait...
Execution time: 143 seconds
Performance: [64-bit integer types]
144660.833758 MIPS
Podejście z innej strony: Kto ma procesor AMD pod Linuxem w jakiejś maszynce działającej 24/365?
Ja mogę mieć ale nie 24/365. Raczej 24h przez kilka dni... kręcony X6 1055T na 3,7GHz.
Jak coś to wejście po SSH się zrobi... To mi przy okazji skonfigurujecie GPU - integrę Radeona 3200 ;).