NLP in CALL

Innovative Technologies and Their Didactic Application

4th Eurocall pre-conference workshop organised by the SIG in Language Processing

(1) Aim of the Workshop    
(2) Schedule    
(3) Papers (a) How Much Intelligence Do We Need?
    (b) ICALL in the Primary School Environment
    (c) Creating User-Friendly, Highly Adaptable and Flexible Language Learning Environments via Flash, XML, Perl and PHP
    (d)

(I)CALL and linguistics

    (e)

Twenty Five Years of NLP in CALL - Reaching Maturity?

(4) Cost to Participants    
 

Aim of the Workshop

It is no coincidence that we have chosen one of the main conference's sub-themes - Innovative Technologies and Their Didactic Application - as the topic for the fourth workshop organized by the Special Interest Group in Language Processing. We would like to draw attention to the innovations in the field of Natural Language Processing (NLP) and discuss their application in CALL.

The workshop is open to all Eurocall members interested in human language technology. Even if you have little or no expertise in NLP, you will find the papers accessible and the discussions useful. In five paper presentations, participants will be introduced to examples of Natural Language Processing (NLP) approaches in CALL and will have the chance to familiarise themselves with the application of NLP techniques in CALL. The way in which such technology can be integrated in computer-assisted language learning will be discussed.

The Special Interest Group in Language Processing is Eurocall's newest SIG. The group organised a successful pre-conference workshops for EUROCALL2000 in Dundee, EUROCALL2001 in Nijmegen, and EUROCALL2002 in Jyväskylä. This year's (fourth) workshop places emphasis on two areas of language processing that a highly relevant to CALL: morpho-syntactic parsing for error diagnosis and  the use of corpora for language learning and teaching. It brings together presenters from Switzerland, Ireland and Canada.

back to the top

Schedule

09:00 - 09:30 Registration / Coffee  
09:30 - 09:45 Opening Mathias Schulze
Chair of SIGLP
09:45 - 10:30 How Much Intelligence Do We Need? Trude Heift
10:30 - 11:15 ICALL in the Primary School Environment Katrina Keogh and Monica Ward
11:15 - 12:00 Creating user-friendly, highly adaptable and flexible language learning environments via Flash, XML, Perl and PHP Thomas Koller
12:00 - 13:30 Lunch  
13:30 - 14:15

(I)CALL and Linguistics

Cornelia Tschichold
14:15 - 15:00

Twenty Five Years of NLP in CALL - Reaching Maturity?

Mathias Schulze
15:00 - 16:00 Round Table Discussion with the panel of contributors
16:00 - 16:30 Coffee Break  

 

Abstracts

Heift, Trude: How Much Intelligence Do We Need?

 

Natural Language Processing (NLP) refers to the automatic analysis of human languages. In Computer-Assisted Language Learning (CALL) NLP is commonly used to create a more interactive and intelligent environment for language learners, one in which students receive error-specific, or, intelligent feedback. However, certain error types can also be addressed with less intelligence on part of the software than is generally required by a sophisticated NLP system. However, the feedback might be just as effective. For example, a morphological lexicon can be used to generate an inflectional paradigm that is not only context-sensitive but also individualized in that it addresses the student's specific error. Ideally, the inflectional paradigm is dynamically generated although the information could also be pre-encoded and stored along with the program. In either case, no sophisticated parsing is needed.
In this presentation, I will discuss a study in which we determined students' use of inflectional paradigms that are part of the error-specific feedback of an online learning environment for German. The inflectional paradigms in the E-Tutor are dynamically generated by extracting the required information from a morphological lexicon. Our study investigates whether students who accessed an error-specific inflectional paradigm were successful in correcting their mistakes without receiving any further error explanation.
In the Spring semester 2003, 98 beginner and intermediate students of German participated in the study. Results indicate that our study participants accessed inflectional paradigms a total of 1726 times. In identifying and signaling the error, we generated three types of feedback placed on a continuum from least to most specific. Study results show that students accessed inflectional paradigms more often if the error explanation was very sparse. Furthermore, we found that if no error explanation was provided only 27% of the students were unable to correct their mistake on first try.

back to the top

Keogh, Katrina and Ward, Monica: ICALL in the Primary School Environment

 

This presentation looks at the suitability of Intelligent CALL (ICALL) materials for use in the primary school environment. It examines what Computational Linguistics (CL)/Natural Language Processing (NLP) tools are available, how they have been put to work in CALL and which ones, if any, can be suitably applied to meet the needs of primary school students (and teachers).
To date, CL/NLP tools such as morphological analysers, POS taggers and parsers have been used in CALL materials. These ICALL materials have generally been deployed with adult learners. For example, Glosser (Dokter & Nerbonne 1998) and FreeText (2001) are two relatively recent examples of successful ICALL resources that are aimed at intermediate speakers of the target language.  Given the often complex nature of these materials, it is sometimes hard to envisage how ICALL could be used with ab-initio, non-linguistically trained or younger primary school learners.
Our research context is that of the primary school in Ireland.  The learners are young beginners, with limited linguistic knowledge of their L1 (English, for the majority of Irish students). While Irish primary school students have experience with learning an L2 (Irish or Gaelic) from an early age, they are not exposed to the linguistic properties or explanations of the language (this also holds true for their L1). Any relevant CL/NLP tool that could be incorporated into their language-learning environment must take this into account.
ICALL systems do not necessarily have to use hi-tech CL/NLP techniques to be useful. Potentially the most useful and relevant systems are based on ‘low-level’ applications like part of speech taggers and morphological analysers.   However, it is imperative that we look at the needs of the user before we consider a solution. The aim therefore, is not to develop a "complete" ICALL tutor, but rather to utilise relevant CL resources that can actually provide useful learning tools for the students.
By focusing on the learner needs, we hope to avoid the potential pitfall outlined by Mishan and Strunz (2003) who remind us that sometimes it can be the case of "technology too often being 'a solution in search of a problem' instead of technology being pedagogy-driven".  With this in mind, we have approached the integration of CL into CALL from the deployment perspective (i.e. the actual needs of the language student/teacher on the ground).  Thus, we have used relatively simple, but effective CL techniques for the teaching of German and Irish in primary schools in Ireland.  Examples will be provided and evaluated throughout the presentation.

References:

Dokter, D. & Nerbonne, J. (1998) A Session with Glosser-Rug In: Jager, S., Nerbonne, J. & van Essen, A. (eds.) Language Teaching and
     Language Technology
pp. 88-94. Lisse: Swets & Zeitlinger.
FreeText, 2001.  FreeText Homepage.  Available at: http://www.latl.unige.ch/freetext/ [Accessed 30 January 2004]
Mishan, F. & Strunz, B. (2003) An Application of XML to the Creation of an Interactive Resource for Authentic Language Learning Tasks. In
     ReCALL 15 (2)
pp. 237-250

back to the top

 

Koller, Thomas: Creating user-friendly, highly adaptable and flexible language learning environments
via Flash, XML, Perl and PHP

 

The development of modern ICALL software requires the integration of graphical components, flexible database technologies and NLP tools. The incorporation of these integrated systems into a CALL environment certainly fosters the acceptance and applicability of ICALL software in the real language learning lab. It is not sufficient to develop sophisticated language processing tools, but one has to create intuitive and adaptable user interfaces. In addition it is useful to deploy flexible database technologies which lend themselves readily to diverse application programming interfaces (API’s).
In my presentation I will present a software architecture which combines the graphical software Macromedia Flash (http://www.macromedia.com/software/flash/), the scripting languages Perl and PHP and the data description and exchange format XML (Extensible Markup Language, http://www.w3.org/XML/). This architecture represents a powerful framework for building sophisticated web-based language learning environments which exceed the graphical and animation capabilities of traditional web-based learning environments.
After a brief outline of the general properties of these technologies, I will demonstrate how Flash can be intertwined with XML, Perl and PHP in order to exchange and process language data. Animated grammars and language games will serve as concrete programming examples. I will also give information about available literature on these combinations of techniques.
Macromedia Flash was originally an animation software. It has since been enhanced with a full-fledged scripting language (ActionScript) supporting the creation of graphical software with a high degree of flexibility and interactivity at runtime. In contrast to established browser cookies, Flash provides the opportunity to save highly structured data (for instance XML data) on the learner’s computer. The amount of stored data can be (theoretically) unlimited, depending on the settings of the user. Therefore, less data has to be saved on the server and the learner definitely has to spend less time online.
The Flash Player plug-in allows a uniform representation on several platforms, regardless of platform, browser type, and screen resolution. The plug-in software is freely available and can be downloaded easily with a modem from home (size of latest version around 650KB).
XML has generally become the most important data description and exchange format. Therefore, document type definitions are available for a wide range of topics. Compared to the handling of relational databases, the basic principles of creating and modifying XML files can be much more easily learned.
XML data can be created, modified and saved by Flash, Perl and PHP. Using XML as data description and exchange format, a strict separation of language data content and processing algorithms can be achieved. In this way, language data can easily be reused in different scenarios. Apart from processing language data itself, Flash can also interchange data with Perl and PHP scripts, supporting a wide range of language processing capabilities within one single architecture (this includes regular expressions which are available in Flash, Perl and PHP). 

back to the top

 

Schulze, Mathias: Twenty Five Years of NLP in CALL - Reaching Maturity?

 

Alan Turing suggested in 1948 that the new computers could demonstrate their ‘intelligence’ in “(i) Various games, e.g., chess, noughts and crosses, bridge, poker; (ii) The learning of languages; (iii) Translation of languages; (iv) Cryptography; (v) Mathematics.” (Turing 1948 cited in Hutchins, 1986, , pp.26f.)). Weischedel et al. (Weischedel, Vogel, & Jarvis, 1978) are usually credited with the first project in parser-based CALL. Nerbonne (2003), in his chapter on Natural Language Processing in Computer-Assisted Language Learning (NLP in CALL) in the Oxford Handbook of Computational Linguistics, argues that recent advances in NLP have much to contribute to CALL. However, in more than twenty five years, very few projects ever reached a level of maturity which led to wide-spread adoption of the this software technology in the language classroom. What are the hurdles in the development and implementation process which appear to have prevented a successful employment of this technology?
In this presentation, I will discuss and compare selected projects of the more than twenty five years since then in an attempt to find some answers to the following questions:
- What are parsers good for? (see also Holland, Maisano, Alderks, & Martin, 1993)
- What features determine the success or failure of such projects?
- What features facilitate integration of parser-based programs in the learning process?
Data with regards to these questions will help us identify possible avenues for future development and research. Gaps, strengths and weaknesses in the application of natural language processing will be shown. Recurring problems such as error recognition and ambiguity, overgeneration of parses, overflagging of errors, lack of rigidity in the analysis results etc. will be highlighted and discussed briefly. The discussion of these projects will look at the artificial intelligence techniques employed (e.g. student profiles and models) and pay attention to the application of parsing algorithms and grammatical formalisms (see e.g. Matthews, 1993). We will look at what problems the developers attempted to address. For what language(s) was the software written? What proficiency levels of students are covered? It is notoriously difficult to ascertain from the research literature which of these project ever left the stage of a research prototype and were tested in authentic learning situations. We will investigate selected examples of documented use.

Holland, V. M., Maisano, R., Alderks, C., & Martin, J. (1993). Parsers in Tutors: What Are They Good For? CALICO JOURNAL, 11(1), 28-46.
Hutchins, J. (1986). Machine Translation - Past, Present and Future. New York: Ellis Horwood Limited.
Matthews, C. (1993). Grammar Frameworks in Intelligent CALL. CALICO JOURNAL, 11(1), 5-27.
Nerbonne, J. (2003). Computer-Assisted Language Learning and Natural Language Processing. In R. Mitkov (Ed.), Handbook of Computational Linguistics (pp. 670-698): Oxford University Press.
Weischedel, R. M., Vogel, W., & Jarvis, M. (1978). An Artificial Intelligence Approach to Language Instruction. Artificial Intelligence, 10, 225-241.

back to the top

 

Tschichold, Cornelia: (I)CALL and Linguistics

 

CALL has more links to developments in (applied) linguistics than is obvious at first sight. During the early period of CALL, in structuralist linguistics, the principle of learning sentence structures was the main aspect of learning a foreign language, and typical CALL exercises of that period do just this, drilling the grammar structures. Later, CALL took advantage of the increasing technical capacities of computers, making considerably more exposure to linguistic data possible, and thus shifting the emphasis from a structuralist, production-based instruction to an input-based instruction, influenced by the new emphasis in applied linguistics on language exposure.
This move from drills to exposure coincides with a move away from dealing with learners’ language production. Today the technical possibilities for delivery of authentic practice material leave little to be desired, but dealing with the productive skill of language learners remains a challenge for CALL, and this lack of true interactivity is problematic. Learners’ language production has to involve more than choosing an answer to a multiple choice question and therefore common string-handling technology is not sufficient any more. CALL needs NLP methods for true progress, but so far, no truly NLP-supported CALL programs have made it into the commercial market. The reasons for this apparent failure are relatively easy to find. They can be grouped into two sections, the first one to do with the type of input a CALL program has to deal with, the second one covering the output expected of a CALL program.
NLP techniques have been developed for applications other than learner language, applications that deal with mostly correct, native-speaker language (preferably English), and they often do not function well enough when used on a different kind of input, i.e. learner language that contains substantially more and different errors than other texts. The needs a CALL program should respond to are fundamentally different from those of users of NLP applications such as MT, information retrieval, etc. Whereas such typical NLP applications are aimed at a native speaker user, often one with very specialized needs, and typically have to deal with texts from a specific domain only, CALL programs normally have to cover general language with its much higher degree of ambiguity. On the output side, the biggest problem is the fact that the tolerance rate for errors or even for superfluous messages produced by the program is close to zero. What we have today in terms of reliable technology are large-scale lexicons and morphology analysers for a number of major languages. These could be made use of to improve the feedback given to learners on the level of single words and short phrases. Given that most learners see vocabulary as the biggest obstacle on their way to success, it would make sense for CALL designers to put more emphasis on vocabulary and intelligent methods of giving feedback on this level, thus linking up with the recent trend in applied linguistics on the lexicon.
CALL activities should focus on the progression from controlled learning to automatic processing of linguistic forms, a step that is generally assumed to be achieved through practice and routinization. Learners’ mastery of morphology and syntax seems to resist improvement in an interactionist teaching approach, a fact which would favour a role for CALL that is centred around vocabulary learning and a lexical approach to language teaching. We have the technology (large-scale lexicons and morphological analysers), the linguistic knowledge (research findings on vocabulary acquisition), and the language data (various corpora) necessary to design language learning tasks that put these principles into practice.

back to the top