Build Proficiency for Language Learners by Assessing AI Outputs: Insights from a UChicago Instructor’s Experimentation

While much of the discourse on generative AI in higher education has concerned student use to circumvent learning goals, particularly for writing assignments like essays and reading responses, learning activities and concerns from disciplines involving other artifacts to demonstrate learning have received less attention. These include disciplines like computer science, math, science, and language instruction. This post is the first in a series to fill in that gap. We begin with language instruction, covering some of the large-scale implications of the technology; providing some historical context on machine translation in education; and sharing insights from a UChicago instructor who has actually been exploring the opportunities and issues presented by AI in language instruction.

What Isn’t New

Some language pedagogy experts have noted that while exercises in translation or expressing an original thought in another language are of course vulnerable to unwelcome assistance from generative AI tools, the field already faced similar challenges that have raised the question of how much knowledge a user still requires of the target language, even when using advanced translation tools.

As Simon Zuberek, a former educational technology and language specialist at Columbia University, explained in a recent podcast interview, a major upgrade to Google Translate in 2016 created similar concern to today’s crisis: questions about learning goals circumvented by a technology that can produce an equivalent product without significant work on the part of the student to develop understanding of the subject matter. As Zuberek notes, however, the application of these tools has not delivered on that specific threat:

“After a couple years, we have seen that that is not what has happened, and it is largely due to the fact that for all their worth the machine translation technology in many ways works like a dictionary in a sense that in order to use it effectively and successfully you have to know the language in the first place…Of course, you can as a user input a sentence in your first language or the dominant language and ask it to translate to target language but you still as a user, you need to know, you need to be able to ascertain whether the translation is correct…whether idiomatically it makes sense, whether it grammatically holds water. So, sure, the basic need of going from A to B is perhaps addressed but you know that in turn sheds light on even more areas of foreign language that maybe without machine translation we wouldn’t even be as aware of.”

As our interview with Romeena Kureishy, instructor of Urdu at UChicago bears out, this limitation of the technology persists, creating both a risk of misapplication and an opportunity for guided instruction in using these tools as a learning aid. Particularly with Less Commonly Taught Languages, the risk and opportunity both come from how poorly AI tools currently produce outputs.

What Is New

As A.G. Elrod has noted in a recent EDUCAUSE article, much of the text online in Less Commonly Taught Languages (LCTLs) is actually very low-quality text that has been generated using machine-translation, often for purposes like advertising. As Elrod describes it, the danger of relying on these tools for LCTLs includes “creating a homogenized version of the languages, one devoid of the subtleties and richness essential for true linguistic and cultural understanding…a disservice to students who seek to learn languages in their authentic form and to those for whom the target language is an essential element of their culture and identity.”

Elrod bases these concerns on recent research published in arXiv, in which the researchers analyzed a corpus of 6.4 billion unique sentences in 90 languages for quality; topic; how many other languages the sentences existed in within the corpus; and whether they showed errors suggesting they originated in that language or another. As a result of this analysis, Thompson et al. suggested that 57.1% of sentences in low-resource languages were machine-translated (likely from English). The low-quality sentences were generally 5-10 words long and concerned topics “requiring little or no expertise or advance effort to create, on topics like being taken more seriously at work, being careful about your choices, six tips for new boat owners, deciding to be happy, etc.” (Thompson et al).

These findings seem to align with the experience of UChicago instructor of Urdu Romeena Kureishy, who decided to explore the learning opportunities presented by these tools in her own class.

Teaching Urdu Students with AI

For the full quarter, Kureishy and her first- and second-year Urdu students have been on a “virtual trip” to Pakistan, and she incorporated ChatGPT as a conversation partner for her students, playing the role of an immigration officer at the Karachi airport.

Why Use AI?

When asked why she decided to experiment with AI in her course, Kureishy explains that as both a teacher of Urdu herself and someone who trains Urdu teachers in twenty-first century pedagogy, she needed to be up to date on the implications and applications of these tools in language instruction.

Going into this experimentation, Kureishy designed this activity to support students’ interpersonal communication skills, particularly in writing. While she gives her students practice speaking with each other as well as with native speakers through videoconferences, it is difficult to find conversation partners who can model correct writing in Urdu script.

How Did They Use AI?

She chose to have this activity take place (or at least be started) in the classroom so that she could be available to monitor the experience and help troubleshoot. Because it takes time for beginning students to write in Urdu, they completed the assignment as homework and then submitted their conversation transcripts for Kureishy to review, and then they discussed them in the next class session.

Her specific directions to students in using AI reflect many of the best practices recommended by writers on generative AI in teaching, which you can find more information about in our other generative AI blog posts.

Students prompted ChatGPT to:

Play the role of an immigration officer talking to a visitor to Karachi.
Ask five questions in Urdu
Ask only one question at a time

The Results

In addition to making her own observations and reading submitted conversations, Kureishy took exit surveys for the activity from her students to get feedback on their experience using this tool.

So, how did it go? Recapping the experience, Kureishy notes that at a high level the activity was engaging to students for its newness and student curiosity over how well the tool would perform. As a conversation partner, however, the tool did not tend to model correct Urdu writing. In assessing the text outputs as an expert reader of Urdu, she first noticed that “AI made…repeated grammatical errors,” that “…it does not use correct syntax everywhere,” and that the outputs from ChatGPT frequently showed issues with gender agreement.

Instead, ChatGPT’s output served as a useful tool for guided reading and correction. With instructor guidance, the class went through all responses one by one, and at some points the students were even able to find the mistakes in the AI outputs themselves, an exercise in error identification and correction that Kureishy found beneficial to her students. This provides an excellent example of critical AI use that other instructors may find useful–especially those who are looking to encourage scrutiny of generative AI outputs and highlight the value of the intellectual skills humans still need to build for themselves. Overall, Kureishy says:

“It was a very engaging class. The students were really engaged because they wanted…to see if they were right…they wanted to see where the AI was wrong. So it ended up being a very, very interesting class…I believe that teacher supervision is necessary, not during the activity, but definitely post-activity where you’re going over it.”

Student Reception

In summarizing the insights from her review of exit tickets and debrief for this activity, Kureishy draws attention to several student reactions in particular:

They found it useful to practice reading Urdu script, reading and writing in real time.
The activity allowed students to practice reading and writing Urdu at their own pace and explore different vocabulary.
However, students did not trust the grammatical correctness of the outputs due to some of the obvious mistakes.

She also draws attention to one student’s desire to explore multiple options for expressing a single thought in Urdu. (For those interested in trying this out, Academic Technology Solutions’ February 2024 prompt book contains a prompt to do something similar.)

“In the past days have you achieved a mention of traveling!!”: Nonsensical and Weird Sentences in Urdu from ChatGPT

Kureishy and her students’ experience bore out the limitations described by both Zuberek and Elrod regarding machine translation. She describes the follow-up conversation with her class thusly: “We were talking about it, the students and I afterwards, and we all thought that ‘You know, Google Translate gives you some very weird translations. And AI was doing the same thing. So it was generating some weird sentences that wouldn’t normally be written or spoken in everyday use.” Below you can find a few examples of nonsensical sentences, translated by Kureishy herself.

آپ کو پاکستان میں رہنے کے لیے کس شہر کا منتخب کیا گیا ہے؟
“To you what city has been chosen to live in Pakistan.”
کیا آپ کو پچھلے دنوں میں کوئی مسافرت کا تذکرہ حاصل ہوا ہے
“In the past days have you achieved a mention of traveling!!”
کراچی میں آپ کیسی محسوس ہو رہی ہے؟
“Which are you feeling in Karachi.”
(Which should be: “کراچی میں آپ کو کیسا محسوس ہو رہا ہے۔”)

Additionally, Kureishy provides details on grammatical errors, particularly in the gender of certain words. She notes that some of these are simple errors. These errors, anyone who has used AI tools will notice, are not something you would expect when prompting AI responses in English.

“وہاں آپ مختلف قسم کی تجارتی مالوں کو دیکھ سکتے ہیں”
- “This sentence uses the possessive marker ‘ki,’ which is used for feminine nouns, but the noun it is using it for is actually masculine. The noun “maal” is an unmarked masculine and does not need to be in oblique “maaloN” because the “ko” in not needed in this sentence.”
“تھیک ہے، آپ کی سفر کی مدت صرف پانچ دن ہے”
- “This sentence spells “theek” with a ت instead of a ٹ. It again assigns the noun safar (travel) a feminine gender, when it is masculine.
“شکاگو کی موسم کیسا تھا؟”
- It is assigning “mausam” (weather) a feminine gender, when it is actually masculine, and “ka” should have been used instead of “ki”.

A Detailed Example of AI Shortcomings with Less Commonly Taught Languages

Furthermore, ChatGPT’s responses also demonstrated a lack of cultural background data in its outputs that aligns with the previously mentioned idiomatic and cultural nuance deficits.

One student experimented with a realistic expectation that an official at the airport in customs might ask for “chai pani.” While this term literally means tea and water, any speaker of Urdu, as Kureishy explained, would know that this is a cultural euphemism for a bribe.

“It’s a cultural thing people will ask for chai pani, which is tea and water, but it means that they want something to let you go or to let some stuff go,” she explains. Having learned this in class a student decided to test if the AI tool would be able to appropriately respond. Kureishy describes a conversation the student had after answering some standard questions about their travel:

“My student said, ‘Oh, can I take care of your chai pani,’ like my student is offering AI a bribe. And if anyone else were talking to my student, if it were another student, and they all knew about this. And the AI says, ‘Oh, sure you can ask me about tea and water.’ Tea and water as a topic is very common. If you have a specific question, please ask me.’ It does not know what the student is talking about. And then my student goes, really playing with the AI: ‘Oh, okay, is a thousand rupees enough?’…insinuating ‘Okay, I’m gonna give you a thousand rupees. I have a lot of prohibited items, is a thousand rupees enough, and the AI says…’I don’t know what you’re talking about…For some reasons a thousand rupees can be enough, but for some other reason they might not be enough.’ and then it goes off on a tangent. Maybe, for example, a thousand rupees is little for someone who lives in a big city, and they’re facing an economic crisis. And then another for another person. It might be a lot of money.”

In Conclusion

This example shows that rather than being a substitute for a human conversation partner, AI chatbots used in the language learning capacity (at least for low-resource or Less Commonly Taught Languages) are currently more useful as a tool for critically engaging with dynamically generated samples of written language. With instructor support, it was good practice for students to form their own sentences in written Urdu and see what errors they could spot in the outputs they got back. Kureishy expects that eventually the outputs in Urdu are likely to improve, but that it should be useful for now as a tool for proficiency building–as long as students have the support and guidance of someone who already knows the language:

“If you don’t know what a sentence should look like, you don’t know what grammatical errors the AI is making…They don’t know if it’s right or wrong. So, you really need to know the language. That is why it helped as an error correction activity rather than a conversational activity which also had its merits, because the students had to type in the language.”

Kureishy’s observations offer an important counterexample to the impressive outputs generative AI tools have delivered in English, which draws attention to flaws in current training data. It also, however, serves as a reminder to consider critical use of AI that keeps the “human-in-the-loop”, even (or especially) in languages where the outputs are more fluent.

Please keep watching our blog for more examples of thoughtful AI use in instruction. If you’re a UChicago instructor trying something new with AI, we’re interested in hearing about it! Email michaelhernandez@uchicago.edu

Further Resources

For more ideas on this topic, please see our previous blog posts about generative AI. For individual assistance, you can visit our office hours, book a consultation with an instructional designer, or email academictech@uchicago.edu. For a list of our upcoming ATS workshops, please visit our workshop schedule for events that fit your schedule.

Build Proficiency for Language Learners by Assessing AI Outputs: Insights from a UChicago Instructor’s Experimentation

What Isn’t New

What Is New

Teaching Urdu Students with AI

Why Use AI?

How Did They Use AI?

The Results

Student Reception

“In the past days have you achieved a mention of traveling!!”: Nonsensical and Weird Sentences in Urdu from ChatGPT

A Detailed Example of AI Shortcomings with Less Commonly Taught Languages

In Conclusion

Further Resources

Search Blog

Recent Posts

Archives

Categories

Build Proficiency for Language Learners by Assessing AI Outputs: Insights from a UChicago Instructor’s Experimentation

What Isn’t New

What Is New

Teaching Urdu Students with AI

Why Use AI?

How Did They Use AI?

The Results

Student Reception

“In the past days have you achieved a mention of traveling!!”: Nonsensical and Weird Sentences in Urdu from ChatGPT

A Detailed Example of AI Shortcomings with Less Commonly Taught Languages

In Conclusion

Further Resources

Search Blog

Subscribe by Email

Recent Posts

Archives

Categories

Tags