Machine translation and human translation in competition

Автор работы: Пользователь скрыл имя, 05 Марта 2013 в 20:52, доклад

Краткое описание

Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one natural language to another.
On a basic level, MT performs simple substitution of words in one natural language for words in another, but that alone usually cannot produce a good translation of a text, because recognition of whole phrases and their closest counterparts in the target language is needed. Solving this problem with corpus and statistical techniques is a rapidly growing field that is leading to better translations, handling differences in linguistic typology, translation of idioms, and the isolation of anomalies.

Вложенные файлы: 1 файл

Introduction опдп.docx

— 26.55 Кб (Скачать файл)
  1. Introduction

 

  Machine translation, sometimes referred to by the abbreviation MT  is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one natural language to another.

  On a basic level, MT performs simple substitution of words in one natural language for words in another, but that alone usually cannot produce a good translation of a text, because recognition of whole phrases and their closest counterparts in the target language is needed. Solving this problem with corpus and statistical techniques is a rapidly growing field that is leading to better translations, handling differences in linguistic typology, translation of idioms, and the isolation of anomalies.

 Current machine translation software often allows for customization by domain or profession (such as weather reports), improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is used. It follows that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text.

  Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has unambiguously identified which words in the text are names. With the assistance of these techniques, MT has proven useful as a tool to assist human translators and, in a very limited number of cases, can even produce output that can be used as is (e.g., weather reports).

  The progress and potential of machine translation has been debated much through its history. Since the 1950s, a number of scholars have questioned the possibility of achieving fully automatic machine translation of high quality. Some critics claim that there are in-principle obstacles to automatizing the translation process.

  1. Main part
    1. MT and the Human Translator

  For the present and immediate future, the uses the more general public makes of MT are restricted to ‘gist’ translation, or fast translation for intelligent users, when human translation is out of the question because of time and other factors. For example, this is an option the European Commission translation services offer people in a hurry. The on-line MT engines are aimed at helping tolerant users deal with ephemeral texts and, generally speaking, they help communication in many situations.

  However, at another level we can talk of human aided MT, in which the human editor/translator often pre-edits the text, or applies the criteria of controlled language, and works with special language domains. After the MT process, the human editor/ translator will post-edit the text before publication. The professional translator today has to learn to make the best of the technology available, and the only way to avoid being a slave of these systems is to understand how they work and use them to advantage.

  It is quite understandable that human translators should react negatively to the idea of MT.  This is partly because their more traditional training has made them expect a high standard of either functionally adapted or creatively translated literary texts, and they find the MT results unacceptable.  The type of exercise described here is by no means intended to substitute this training, which is very valuable for the literary and more culturally orientated translation that MT producers have never seriously aspired to produce.   However, most professional translators earn their livings by translating more mundane, technical texts and, as MT and other forms of translation technology improve, it is also understandable that they should feel threatened by their possibilities.

  The positive side of increased communication through MT, for the human translator, is that it encourages curiosity about texts in unknown languages in people who would previously have simply ignored their existence. In the long run, this curiosity can only lead to a demand for more good human translation. In fact, it is probably true to say that English is a bigger threat to multilingualism and the translator than MT.

    1. Evaluation of Machine Translation

  The evaluation of human translation has always been a subject for lively discussion, whether the critic is evaluating student translation, editing professional translation or complaining about perceived mistakes in published translations, and the level of the objections will range from totally justifiable to highly subjective.  Research into the translation process tries to analyse the psychological reactions of translators as they translate, using methods including Kussmaul’s (1995) ‘think-aloud protocols’ and Jakobsen’s (2003) Translog software for tracking translator’s work patterns on the computer. The quantity of analysis of the finished result of translation is enormous, but not much is conducted in a systematic manner, despite efforts by such people as House (1977 & 1997) to introduce functional analysis of translation, Baker (1998) and Laviosa (1998) to observe tendencies in translation using translation corpora, and attempts to establish ‘universals’ of translation.

It is therefore only to be expected that the evaluation of MT should also be a complex issue, and cover both the MT systems themselves and the resulting translations.  The types of evaluation of MT used are described in FEMTI - A Framework for the Evaluation of Machine Translation in ISLE. Since MT systems are usually constructed by computational linguists, or people with training in both linguistics and computer programming, it is only natural that people with a similar training should evaluate these systems for reasons pertaining to the efficiency of the technology from an internal point of view.  There are various obvious reasons for carrying out this kind of evaluation which requires looking into the ‘glass box’ of MT, or being able to see into the system and examine, correct or criticise it.  This type of analysis goes beyond the pedagogical methodology discussed here, although we hope it may prove a possibility for future research.

  External evaluation, in which the system is evaluated by outsiders dealing with the ‘black box’ of MT, or with access only to the results, is carried out by MT providers in order to test their systems with potential users.  Although external evaluation is carried out using (semi-) automatic techniques, as demonstrated by Ajman & Hartley (2002), a more traditional method is to ask potential users to test a system that has been prepared for a specific purpose and to evaluate the results on a gradient of excellent to unintelligible. The people chosen to do the evaluation are rarely experts in translation, who might be hyper-critical, and the emphasis is on evaluating the system on the macro-level of overall competence of the system, rather than on the micro-level of syntactic or lexical detail. At a more ad hoc level, there must be plenty of people who apply their own tests to on-line systems in order to decide which commercial system to buy.  It was within the context of looking at on-line ‘black boxes’ that our own experiment was carried out.

The Early Machine Translation systems

  Early Systems (GAT). Georgetown Automatic Translation is one of the earliest MT projects and the development began in 1952, in use 1964-1979. GAT translate physics texts from Russian to English and has a replacement of words.

  Early Systems (CETA). Centre d’Etudes pour la Traduction Automatique launched in 1961 in Grenoble and in use 1967-71years. CETA translated approximately 400,000 words from Russian to French.

  Early Systems (SYSTRAN). One of the first systems marketed, installed in 1970 (US Air Force Foreign Technology Division) and used also at NASA and EURATOM. GM of Canada claimed the system speeded up the work of human translators three to four times (3000-4000 words a day, approximately the same a human translator now translates with the help of translation workbenches).

  Early Systems (TAUM-METEO). TAUM-METEO was the first truly automatic MT system, developed in 1960’s, used by Canadian Meteorological Center, corrected its own errors without post-editors and forwarded offending content to human translators.

Problems.  1) Translation is not straightforward.

    • it is not replacing words for words
    • word orders
    • rewriting of text into another language
    • choosing the right words
    • e.g. imperative mood in English infinitive in French
  1. Automation of translation not easy.
    • quality is poor
    • homographs
      • ”fan” a ventilator or an enthusiast
      • different word classes
        • e.g. ”love” both a verb and a noun
        • ”you” can be both singular and plural
    • idioms
      • e.g. ”country music” meaning type of music
    • personal pronouns
      • second person pronouns may vary in familiar and formal situations
    • also post-editing can take more time than translating from a scratch
  1. Morphological analysis.
      • e.g. Chinese and Japanese do not use punctuations
      • sentences are not separated by anything
  1. Syntactic analysis
      • modifiers a problem
      • ”The boy saw a girl with a telescope”
      • the girl had a telescope vs. the boy used a telescope to see a girl
  1. Analysis of context
      • 20-40 words in a sentence
      • 100 million possible translations
  1. There are always going to be problem cases

Proposed Solutions to the Problems

  AI-Based Approach

      • Raman & Alwar 1990
      • Conversations carried out across enquiry counters on railway stations in India
      • System should understand a text before translating it
      • analysis of text to understand the meaning and storing it in a language-free semantic map
      • semantic maps used to generate translations
    • Analyzer analyses one sentence at a time
      • unnecessary adjectives not taken into account
    • morphological analysis first
    • building of semantic map second
    • stages work concurrently
    • large dictionary needed

Interactive Approach

  • Sen, Zhaoxiong and Heyan 1997
  • Knowledge of MT systems incomplete -> incorrect translations
  • Possibility for an MT system to learn
    • quality should improve
  • Interaction starts when a sentence is found that the system cannot analyse properly
    • message to the user
    • user responds with a coded message
      • updates systems knowledge base
    • interaction limited to three stages
      • lexical analysis
      • uncertain modifiers
      • multiple translations

Multiple Translation Engines & Sentence Partitioning

  • Ren, Shi and Kuroiwa 2000
  • Multiple MT systems running in parallel
    • all use different MT techniques
    • controller coordinates translating
    • each engine translates a sentence indepedently
    • controller chooses the best translation
      • no proper translations leads to sentence partitioning
      • process starts from beginning
      • in the end the partitioned sentence is put back together

Multiple Translation Engines & Sentence Partitioning (2)

    • Parallel processing should improve success rate
    • correct translation preserved through procedures
    • combining the best translations should improve quality
    • Morphological analysis
    • analysis gives results that are used as inpupts for the engines
    • engines are then ran on parallel
    • if more than one result amount of engines increase
    • if no results sentence is partitioned
      • problem of partitioning a sentence e.g. Chinese & Japanese
      • In a test situation with four engines the results improved dramatically
    • consumed time doubled
    • 1 MT system translated 45.6 % of sentences correctly

with multiple engines the result was 74.2 % (Japanese to Chinese).

  1. Conclusion

  Translation for interpersonal communication covers the role of translation in face to- face communication (dialogue, conversation) and in correspondence, whether in traditional mail or in the newer electronic, more immediate, form. Translators have been employed occasionally by their organisations in these areas, e.g. as interpreters for foreign visitors and as mediators in company correspondence, and they will continue to do so. But for the real-time translation of electronic messages it is not possible to envisage any role for the translator; for this, the only possibility is the use of fully automatic systems.

  However, the presence of automatic translation facilities on the Internet will

undoubtedly alert a much wider public to the importance of translation as a major and crucial feature of global communication, probably to a degree never before experienced. Inevitably, translation will itself receive a much higher profile than in the past. People using the crude output of MT systems will come to realise the added value (that is to say, the higher quality) of professionally produced translations. As a consequence, the demand for human produced translation will certainly rise, and the translation profession will be busier than ever. Fortunately, professional translators will have the support of a wide range of computer-based translation tools, enabling them to increase productivity and to improve consistency and quality. In brief, automation and MT will not be a threat to the

livelihood of the translator, but will be the source of even greater business and will be the means of achieving considerably improved working conditions.

 

 

 

 

 

 

 

 


Информация о работе Machine translation and human translation in competition