논문 주소 : https://arxiv.org/abs/2501.12948
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasonin
arxiv.org
논문 내용에서,
Recently, post-training has emerged as an important component of the full training pipeline. It has been shown to enhance accuracy on reasoning tasks, align with social values, and adapt to user preferences, all while requiring relatively minimal computational resources against pre-training. In the context of reasoning capabilities,
결론
천안문, 시진핑 등에 대한 사회적으로 민감한 답변에 대한 처리를 돌린다는 것.
'교육과AI' 카테고리의 다른 글
교육디지털 원패스 앱 오류 정리 (1) | 2025.03.01 |
---|---|
AIDT. 스크린 리딩은 괜찮을까? (2) | 2025.02.28 |
구글공인트레이너 관련 블로그 (0) | 2025.02.21 |
Accelerating scientific breakthroughs with an AI co-scientist (0) | 2025.02.20 |
Generative AI Can Harm Learning(2024.07) (1) | 2025.02.06 |