What Does deepseek Mean?
Pretraining on 14.8T tokens of the multilingual corpus, generally English and Chinese. It contained a better ratio of math and programming as opposed to pretraining dataset of V2.This significantly improves our teaching performance and minimizes the education prices, enabling us to additional scale up the design dimensions without further overhead.