Reading: Bettering mathematical reasoning with course of supervision

Bettering mathematical reasoning with course of supervision

Last updated: 2023/11/13 at 6:05 PM

AI News Nest 10 months ago

We have skilled a mannequin to attain a brand new state-of-the-art in mathematical downside fixing by rewarding every appropriate step of reasoning (“course of supervision”) as a substitute of merely rewarding the right remaining reply (“final result supervision”). Along with boosting efficiency relative to final result supervision, course of supervision additionally has an vital alignment profit: it instantly trains the mannequin to provide a chain-of-thought that’s endorsed by people.