color directly the GeForce GTX 580 is out of reach. This will
disappoint to some because many thought they saw ahead of AMD with
Nvidia GPU performance. AMD also believed but probably also had its
sights lower. The cause may be a new architecture which has proved
more complex than expected to get the best output but also excellent
GeForce GTX 500 which has thwarted the original plans.
With Cayman, AMD took a risk by deciding to review one aspect of the
architecture of its computation units that had not changed since the
Radeon HD 2900 XT. In a simplified manner, we characterize the
computational units of AMD vec5, which means they are capable of
executing up to five instructions in parallel. However, with such an
architecture, if the code to run does not parallelize as many
statements, they will not be fully exploited, in contrast to the
scalar architecture from Nvidia that can maintain high efficiency over
a maximum of situations. Both approaches equally valid as the other.
Please do not confuse arithmetic unit with Core, a marketing concept
used by Nvidia to be compared to CPUs and AMD followed by the
opportunity to have 5 cores per unit calculation vec5. Overall you can
see things from two angles: one unit vec5 AMD is more efficient than
scalar unit from Nvidia or AMD's core is less efficient than Nvidia
core. With the GeForce GTX 460 GF104 and its derivatives, Nvidia got
closer to a vector operation to increase efficiency and AMD intends to
do the same with the Cayman, but in the other direction, from vec5 to
vec4 . The calculation units of Cayman are thus less powerful than
previous AMD GPUs, but they are statistically more efficient but not
more powerful, the distinction is important. Cons by being simple,
these units occupy less space and consume less, thereby increasing the
number, all other things being equal.
In more detail, the previous Radeon GPUs were based on computing units
of type 4 + 1, with a run line can handle complex instructions. This
one that AMD has decided to get rid of. In return for these complex
instructions to be processed on other lines through a succession of
simpler operations. These instructions monopolize and 3 of the 4 lines
of execution, making them much more intensive since only a single
instruction can be executed in parallel against 4 before. Without this
one a bit special and in some cases difficult to feed properly, the
compiler will see his task greatly simplified, which in some cases may
even make these units more efficient than vec4 previous vec5 but
overall AMD now needs more computing units vec4 to maintain the same
level of performance.
While Cypress GPU Radeon HD 5800, had 20 blocks of 16 computer units
vec5, Cayman has 24 blocks of 16 computer units vec4. We are dealing
with 320 units against 384 units vec5 vec4, which is less flattering
when incorporated into cores because it gives us 1536 cores only cons
for Cayman 1600 for Cypress. An important detail, however, is found at
the texturing units whose number is fixed at 4 per block. Cayman to
see his power at this level increased from 20% to equal frequency.
Note that AMD has reported increasing flow calculation in double
precision, but it is a twisted way of interpreting the fact that the
"one" does not support double precision. A unit of Cayman is identical
to a unit of Cypress at this level.
No comments:
Post a Comment