Machine translation and human translation in competition

Автор работы: Пользователь скрыл имя, 05 Марта 2013 в 20:52, доклад

Краткое описание

Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one natural language to another.
On a basic level, MT performs simple substitution of words in one natural language for words in another, but that alone usually cannot produce a good translation of a text, because recognition of whole phrases and their closest counterparts in the target language is needed. Solving this problem with corpus and statistical techniques is a rapidly growing field that is leading to better translations, handling differences in linguistic typology, translation of idioms, and the isolation of anomalies.

Скачать в ZIP архиве (24.14 Кб) Сколько стоит заказать работу?

Вложенные файлы: 1 файл

Introduction опдп.docx

— 26.55 Кб (Скачать файл)

Introduction

On a basic level, MT performs simple substitution of words in one natural language for words in another, but that alone usually cannot produce a good translation of a text, because recognition of whole phrases and their closest counterparts in the target language is needed. Solving this problem with corpus and statistical techniques is a rapidly growing field that is leading to better translations, handling differences in linguistic typology, translation of idioms, and the isolation of anomalies.

Current machine translation software often allows for customization by domain or profession (such as weather reports), improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is used. It follows that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text.

Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has unambiguously identified which words in the text are names. With the assistance of these techniques, MT has proven useful as a tool to assist human translators and, in a very limited number of cases, can even produce output that can be used as is (e.g., weather reports).

The progress and potential of machine translation has been debated much through its history. Since the 1950s, a number of scholars have questioned the possibility of achieving fully automatic machine translation of high quality. Some critics claim that there are in-principle obstacles to automatizing the translation process.

Main part

MT and the Human Translator

For the present and immediate future, the uses the more general public makes of MT are restricted to ‘gist’ translation, or fast translation for intelligent users, when human translation is out of the question because of time and other factors. For example, this is an option the European Commission translation services offer people in a hurry. The on-line MT engines are aimed at helping tolerant users deal with ephemeral texts and, generally speaking, they help communication in many situations.

However, at another level we can talk of human aided MT, in which the human editor/translator often pre-edits the text, or applies the criteria of controlled language, and works with special language domains. After the MT process, the human editor/ translator will post-edit the text before publication. The professional translator today has to learn to make the best of the technology available, and the only way to avoid being a slave of these systems is to understand how they work and use them to advantage.

It is quite understandable that human translators should react negatively to the idea of MT. This is partly because their more traditional training has made them expect a high standard of either functionally adapted or creatively translated literary texts, and they find the MT results unacceptable. The type of exercise described here is by no means intended to substitute this training, which is very valuable for the literary and more culturally orientated translation that MT producers have never seriously aspired to produce. However, most professional translators earn their livings by translating more mundane, technical texts and, as MT and other forms of translation technology improve, it is also understandable that they should feel threatened by their possibilities.

The positive side of increased communication through MT, for the human translator, is that it encourages curiosity about texts in unknown languages in people who would previously have simply ignored their existence. In the long run, this curiosity can only lead to a demand for more good human translation. In fact, it is probably true to say that English is a bigger threat to multilingualism and the translator than MT.

Evaluation of Machine Translation

The evaluation of human translation has always been a subject for lively discussion, whether the critic is evaluating student translation, editing professional translation or complaining about perceived mistakes in published translations, and the level of the objections will range from totally justifiable to highly subjective. Research into the translation process tries to analyse the psychological reactions of translators as they translate, using methods including Kussmaul’s (1995) ‘think-aloud protocols’ and Jakobsen’s (2003) Translog software for tracking translator’s work patterns on the computer. The quantity of analysis of the finished result of translation is enormous, but not much is conducted in a systematic manner, despite efforts by such people as House (1977 & 1997) to introduce functional analysis of translation, Baker (1998) and Laviosa (1998) to observe tendencies in translation using translation corpora, and attempts to establish ‘universals’ of translation.

It is therefore only to be expected that the evaluation of MT should also be a complex issue, and cover both the MT systems themselves and the resulting translations. The types of evaluation of MT used are described in FEMTI - A Framework for the Evaluation of Machine Translation in ISLE. Since MT systems are usually constructed by computational linguists, or people with training in both linguistics and computer programming, it is only natural that people with a similar training should evaluate these systems for reasons pertaining to the efficiency of the technology from an internal point of view. There are various obvious reasons for carrying out this kind of evaluation which requires looking into the ‘glass box’ of MT, or being able to see into the system and examine, correct or criticise it. This type of analysis goes beyond the pedagogical methodology discussed here, although we hope it may prove a possibility for future research.

External evaluation, in which the system is evaluated by outsiders dealing with the ‘black box’ of MT, or with access only to the results, is carried out by MT providers in order to test their systems with potential users. Although external evaluation is carried out using (semi-) automatic techniques, as demonstrated by Ajman & Hartley (2002), a more traditional method is to ask potential users to test a system that has been prepared for a specific purpose and to evaluate the results on a gradient of excellent to unintelligible. The people chosen to do the evaluation are rarely experts in translation, who might be hyper-critical, and the emphasis is on evaluating the system on the macro-level of overall competence of the system, rather than on the micro-level of syntactic or lexical detail. At a more ad hoc level, there must be plenty of people who apply their own tests to on-line systems in order to decide which commercial system to buy. It was within the context of looking at on-line ‘black boxes’ that our own experiment was carried out.

The Early Machine Translation systems

Early Systems (GAT). Georgetown Automatic Translation is one of the earliest MT projects and the development began in 1952, in use 1964-1979. GAT translate physics texts from Russian to English and has a replacement of words.

Early Systems (CETA). Centre d’Etudes pour la Traduction Automatique launched in 1961 in Grenoble and in use 1967-71years. CETA translated approximately 400,000 words from Russian to French.

Early Systems (SYSTRAN). One of the first systems marketed, installed in 1970 (US Air Force Foreign Technology Division) and used also at NASA and EURATOM. GM of Canada claimed the system speeded up the work of human translators three to four times (3000-4000 words a day, approximately the same a human translator now translates with the help of translation workbenches).

Early Systems (TAUM-METEO). TAUM-METEO was the first truly automatic MT system, developed in 1960’s, used by Canadian Meteorological Center, corrected its own errors without post-editors and forwarded offending content to human translators.

Problems. 1) Translation is not straightforward.

it is not replacing words for words
word orders
rewriting of text into another language
choosing the right words
e.g. imperative mood in English infinitive in French

Automation of translation not easy.

quality is poor
homographs

”fan” a ventilator or an enthusiast
different word classes

e.g. ”love” both a verb and a noun
”you” can be both singular and plural

idioms

e.g. ”country music” meaning type of music

personal pronouns

second person pronouns may vary in familiar and formal situations

also post-editing can take more time than translating from a scratch

Morphological analysis.

e.g. Chinese and Japanese do not use punctuations

sentences are not separated by anything

Syntactic analysis

modifiers a problem

”The boy saw a girl with a telescope”

the girl had a telescope vs. the boy used a telescope to see a girl

Analysis of context

20-40 words in a sentence

100 million possible translations

There are always going to be problem cases

Proposed Solutions to the Problems

AI-Based Approach

Raman & Alwar 1990
Conversations carried out across enquiry counters on railway stations in India
System should understand a text before translating it
analysis of text to understand the meaning and storing it in a language-free semantic map
semantic maps used to generate translations

Analyzer analyses one sentence at a time

unnecessary adjectives not taken into account

morphological analysis first

building of semantic map second
stages work concurrently
large dictionary needed

Interactive Approach

Sen, Zhaoxiong and Heyan 1997
Knowledge of MT systems incomplete -> incorrect translations
Possibility for an MT system to learn

quality should improve

Interaction starts when a sentence is found that the system cannot analyse properly

message to the user
user responds with a coded message

updates systems knowledge base

interaction limited to three stages

lexical analysis
uncertain modifiers
multiple translations

Multiple Translation Engines & Sentence Partitioning

Ren, Shi and Kuroiwa 2000
Multiple MT systems running in parallel

all use different MT techniques
controller coordinates translating
each engine translates a sentence indepedently
controller chooses the best translation

no proper translations leads to sentence partitioning
process starts from beginning
in the end the partitioned sentence is put back together

Multiple Translation Engines & Sentence Partitioning (2)

Parallel processing should improve success rate

correct translation preserved through procedures

combining the best translations should improve quality

Morphological analysis

analysis gives results that are used as inpupts for the engines

engines are then ran on parallel
if more than one result amount of engines increase
if no results sentence is partitioned

problem of partitioning a sentence e.g. Chinese & Japanese
In a test situation with four engines the results improved dramatically

consumed time doubled
1 MT system translated 45.6 % of sentences correctly

with multiple engines the result was 74.2 % (Japanese to Chinese).

Conclusion

Translation for interpersonal communication covers the role of translation in face to- face communication (dialogue, conversation) and in correspondence, whether in traditional mail or in the newer electronic, more immediate, form. Translators have been employed occasionally by their organisations in these areas, e.g. as interpreters for foreign visitors and as mediators in company correspondence, and they will continue to do so. But for the real-time translation of electronic messages it is not possible to envisage any role for the translator; for this, the only possibility is the use of fully automatic systems.

However, the presence of automatic translation facilities on the Internet will

undoubtedly alert a much wider public to the importance of translation as a major and crucial feature of global communication, probably to a degree never before experienced. Inevitably, translation will itself receive a much higher profile than in the past. People using the crude output of MT systems will come to realise the added value (that is to say, the higher quality) of professionally produced translations. As a consequence, the demand for human produced translation will certainly rise, and the translation profession will be busier than ever. Fortunately, professional translators will have the support of a wide range of computer-based translation tools, enabling them to increase productivity and to improve consistency and quality. In brief, automation and MT will not be a threat to the

livelihood of the translator, but will be the source of even greater business and will be the means of achieving considerably improved working conditions.

Информация о работе Machine translation and human translation in competition