교육과AI
DeepSeek은 천안문 등을 피처로 사용했나보다.
파이썬을 시작하긴했는데
2025. 2. 8. 21:10
728x90
논문 주소 : https://arxiv.org/abs/2501.12948
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasonin
arxiv.org
논문 내용에서,
Recently, post-training has emerged as an important component of the full training pipeline. It has been shown to enhance accuracy on reasoning tasks, align with social values, and adapt to user preferences, all while requiring relatively minimal computational resources against pre-training. In the context of reasoning capabilities,
결론
천안문, 시진핑 등에 대한 사회적으로 민감한 답변에 대한 처리를 돌린다는 것.
728x90