I think it's best if I elaborate on your second point with an example move in the game 1 between AlphaZero and Stockfish which also served to satisfy my curiosity today. I think it's fair to say that AlphaZero was using an 800 pound gorilla of a hardware configuration compared to Stockfishes mouse. A top of the line I7 that you would find in a gaming machine would typically be about 100 GFLOPs (i.e. Note TFLOPS = 1000 billion floating point operations per second.įor comparison Intel's latest most powerful chip is the Core i9 Extreme Edition processor which clocks in at 1 TFLOP. They used 4 TPUs for the games, so a processing power of 180 TFLOPS. GB of High Bandwidth Memory in the second generation design increasedīandwidth to 600 GB/s and performance to 45 TFLOPS. Google stated theįirst generation TPU design was memory bandwidth limited, and using 16 The second generation TPU was announced in May 2017. This is how Wikipedia describes the second generation TPUs they used. It used specialized Tensor Processor Units (TPUs) rather than general Central Processing Units (CPUs) as are available commercially. So, AlphaZero used special hardware developed by Google. TPUs Stockfish and Elmo played at their strongest skill level using 64 Second-generation TPUs to train the neural networksĪlphaZero and the previous AlphaGo Zero used a single machine with 4 Starting from randomly initialised parameters, using 5,000įirst-generation TPUs (15) to generate self-play games and 64 Training proceeded for 700,000 steps (mini-batches of size 4,096) This is Google you're talking about! So the answer is obviously "No".įrom the original paper hardware used for initialising and training. The last 3 xxx are always different, even on exactly the same hardware.That's why I wonder whether there have been any attempts made to Kn/s: 123456 (only the first + second + third (here 123xxx is important to compare) Kn/s: 12345 (only the first + second (here 12xxx) is important to compare). On another hardware it could be 2000 (2xxx). Kn/s: 1234 (only the first (here the 1xxx) is important to compare). You will never see exactly the same kn/s but that is today not important to compare. You only need to decide at the beginning if you want to compare tests with one core or more cores or maybe something like 2 vs 4 cores. After 1 minute is over you click the button again to stop the analysis and take the kn/s and run the test again with other engines on different hardware. To measure and to compare the engine speed, you can use a free chess programm and then click the analysis button at the beginning of a game (starting position). The fact that is scales so well with STM suggests to me that the SIMD code is suboptimal and could probably be improved to achieve better ILP.Īpple M1 CPU speed is very disappointingTo solve some answers and questions, take a look here: Stockfish seems to support Neon (ARM SIMD) but it's not really clear how it is utilized. It's performance is comparable to much larger (and 4-5x hotter!) CPUs.įew additional observations: AMD seems to do really well in stockfish (for unclear reasons) and this benchmark really loves SMT. My conclusion: no, M1's performance is not disappointing at all. Ah, and M1 is about 50% faster in the single-threaded variant than a 4.8ghz Intel Skylake refresh. 8-core desktop CPUs are considerably faster. Looking at the linked, M1 is faster than Tiger Lake. In a nutshell, M1 with it's 4 cores is 10% slower than top-shelf Intel mobile CPU with 8 cores while consuming 80% less power. ~ 1700 knodes/s for 1 thread using 35W of power ~ 13000 knodes/s for 16 threads using 65W of power ~ 2500 knodes/s for 1 thead using 5W of power ~ 12000 knodes/s for 8 threads using 15W of power I ran the benchmarks with the settings suggested by (stockfish bench 128 NUMCORES 24 default depth) with the following results (power usage checked using Apple's powermetrics):
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |