The outlook: Researchers are concerned about the core methods and objectives of natural-language processing (NLP), the branch of AI focused on creating systems that analyze or generate human language. Are these methods enough to achieve the field’s ultimate goals? What even are those goals?
What needs fixing: The way NLP is evaluated could be part of the problem. Researchers publish new data sets of even trickier questions, only to see even bigger neural networks quickly post impressive scores. But many people in the field are growing weary of such leaderboard-chasing. Do recent “advances” really translate into helping people solve problems? Such doubts are more than abstract fretting; whether systems are truly proficient at language comprehension has real stakes for society.
Where we go from here: To bring evaluations more in line with the targets, it helps to consider what holds today’s systems back. A human reading a passage will build a detailed representation of entities, locations, events, and their relationships—a “mental model” of the world described in the text. To construct more meaningful evaluations, NLP researchers should test whether an AI system is able to construct this sort of model. Read the full story.
Written by Jesse Dunietz, a researcher at Elemental Cognition, where he works on developing rigorous evaluations for reading comprehension systems. He is also an educational designer for MIT’s Communication Lab and a science writer.
 
No comments:
Post a Comment