The goal for the first day of the 4th week was to understand why wasn't the program compiling.
One of the problems was that I was not storing the memory correctly into the host arrays and the device arrays.
I fixed this because I was not transferring the information from the host to the device arrays, and the kernel was using arrays that did not have memory in them.
I was able to produce the matrix for the tool’s program.
I was able to produce values from my own program.
I was able to print a matrix with values(They seemed somehow correct)
Challenge:
There is some segmentation conflict because the program was storing data where it is not supposed to.(Problem was fixed, this problem usually happens because we are using more threads than necessary so they go out of bound meaning out of the arrays' size).
New Challenge:
My own program is not not outputting any values.
This happened when I changed the byte size when allocating memory.
Need to know what is happening with this values.
For some reason I connected to Stampede to continue the parallelization and there was code moved around in my program. It no longer printed the
Challenge:
threadIdx’s values are not being stored in the array?(could be?)
I changed the size of the arrays when allocating memory and it caused the program to not print the matrix nor it is not giving me results.
I need to understand where are the thread’s id going and why they aren’t being stored in the array or why the function circuit_value does not return any 1’s.
After fixing those problems, I figured that there was an array that did not needed to be passed through the kernel to the device. Those values were just gonna be used for a simple multiplication after fixing that problem the program displayed the correct matrix.
I learned that CUDA can make programs run up 5000 times faster than a normal program this of course based on the time it takes for a program to run through the whole computation process.
One of the problems was that I was not storing the memory correctly into the host arrays and the device arrays.
I fixed this because I was not transferring the information from the host to the device arrays, and the kernel was using arrays that did not have memory in them.
I was able to produce the matrix for the tool’s program.
I was able to produce values from my own program.
I was able to print a matrix with values(They seemed somehow correct)
Challenge:
There is some segmentation conflict because the program was storing data where it is not supposed to.(Problem was fixed, this problem usually happens because we are using more threads than necessary so they go out of bound meaning out of the arrays' size).
New Challenge:
My own program is not not outputting any values.
This happened when I changed the byte size when allocating memory.
Need to know what is happening with this values.
For some reason I connected to Stampede to continue the parallelization and there was code moved around in my program. It no longer printed the
Challenge:
threadIdx’s values are not being stored in the array?(could be?)
I changed the size of the arrays when allocating memory and it caused the program to not print the matrix nor it is not giving me results.
I need to understand where are the thread’s id going and why they aren’t being stored in the array or why the function circuit_value does not return any 1’s.
After fixing those problems, I figured that there was an array that did not needed to be passed through the kernel to the device. Those values were just gonna be used for a simple multiplication after fixing that problem the program displayed the correct matrix.
I learned that CUDA can make programs run up 5000 times faster than a normal program this of course based on the time it takes for a program to run through the whole computation process.