Google Translate

There are dozens of machine translators on the web, but probably none of them is as used as Google Translate. This does not necessarily mean that this machine translation is the best one created, but it is a good choice for someone who is looking for a free translator. Of course, we should always bear in mind that a machine is never as precise  as a human, so we can never totally rely on the translation given. There are always quite a lot of mistakes, especially in long sentences and texts.

To begin with this article,  I think it is quite interesting to know a little bit more about machine translation, so before continuing reading, you should take a look at this article I wrote some time ago. As I assume that you already know a little bit about machine translation, I will start talking about Google Translate.

Continue reading

Advertisements

The British National Corpus

I am going to write this article about the British National Corpus, but as I’m sure many people won’t know what a corpus is, I think it is important that I give an explanation. That is why I am going to start by writing a few lines on corpora in general, and then I will focus my article on the British National Corpus, trying to explain how it works.

CORPUS

What is a corpus?

According to the Oxford Dictionary, a corpus is “a collection of written or spoken material in machine-readable form, assembled for the purpose of linguistic research”.

The plural word to corpus is usually “corpora”.

What are they used for?

They are used to store words, whose features can be analyzed by means of tagging and use of concordancing programs, and they help studying linguistic competence. They are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules on a specific universe.

Continue reading

Dictionary: Wordreference

WordReference is a free online dictionary used by thousands of people all around the world as it involves some of the most important languages in the world: English, Italian, Spanish, French and Portuguese. They are divided into the pairs English-French, English-Italian, English-Spanish, Spanish-Portuguese and English-Portuguese.

Although it might seem that these are not many languages, in fact French, Italian, Spanish and Portuguese represent around 93% of the Romance language speakers in the world, which, as far as I am concerned, is quite a lot.

In 2009, more language pairs were added: English-German, English-Russian, English-Romanian, English-Polish, English-Czech, English-Greek, English-Turkish, English-Chinese, English-Japanese, English-Korean and English-Arabic, but they are still in progress of being finished.

Continue reading

REVIEW: Google Translate

GOOGLE TRANSLATOR

In this review, I will make a detailed description of one of the most famous online translators in Internet, that is to say Google translator. Then, we will compare it with other three translators: Yahoo Babel Fish and Reverso.

Google Translate is a language resource which is able to translate texts, web pages and documents into different languages. This online linguistic tool appeared on 2006 and it is one of the most used translators around the world. It was created by Google Inc. There are two versions available of Google Translate.  The first one was designed for iPhone users in 2008 and this translator cover about 23 different languages. Then, the second one was developed as an Android version which was divided mainly into two options: “ SMS translation” and “History”.

The incorporation of languages available in the translators was progressive. There were 23 stages. The first combinations were English- French, English- German, and English- Spanish. Nowadays it is possible to find a different translation for the word.

METHODOLOGY

The method used by Google Translator is statistical method translation.  This statistical method translation was created due to a bilingual text corpus which contained about a million words. In addition, a second corpora of a billion words is incorporated in the statistical method and the technology which supports this translation tool is SYSTRAN.

USING THE TRANSLATOR

The use of Google Translator is very easy. To start with, we have to choose the languages we want to work with. There are 52. Then we have to paste the text we have chosen to translate and paste on the box. Once we have done it, we press the button “translate” we wait until the translation appears on an open window with the translated version. If we want to translate a document, we simply press the option “translate a document”. It will appear a box with the option “examinar”. Then we have to choose the document and automatically the translator will do its job.

TRANSLATING DIFFERENT TYPES OF TEXTS

When we are working with an online translator, we have to take into account that as an automatic tool it has its advantages as for instance: speed and an easy use when we want to work with a text. However, as it is not a human translation, usually the translations of the original texts are far from being precise, accurate and above they are characterized by a lack of naturalness and coherence.

In order to check the competence of the translator, we have decided to translate two texts of different nature. The first one is a literary text, particularly a poem, in this case a sonnet written by William Shakespeare in 1609. The title of the poem is “ Shall I compare thee to a summer’s day?”. In the next slides we can see the original text, the translated version and my own translation.

LITERARY TEXT

ORIGINAL TEXT

SONNET 18

Shall I compare thee to a summer’s day?
Thou art more lovely and more temperate:              
Rough winds do shake the darling buds of May,
And summer’s lease hath all too short a date:
Sometime too hot the eye of heaven shines,
And often is his gold complexion dimm’d;
And every fair from fair sometime declines,
By chance or nature’s changing course untrimm’d;
But thy eternal summer shall not fade
Nor lose possession of that fair thou owest;
Nor shall Death brag thou wander’st in his shade,
When in eternal lines to time thou growest:
So long as men can breathe or eyes can see,
So long lives this and this gives life to thee.

TRANSLATED VERSION

Voy a compararte con un día de verano?
Tú eres más hermoso y más templado de arte:
Vientos ásperos agitan los capullos favorito de mayo,
Y el arrendamiento de verano tiene todo demasiado corta una fecha:
En algún momento muy caliente el ojo del cielo brilla,
Y a menudo es su tez de oro dimm’d;
Y todas las ferias de la disminución razonable en algún momento,
Por casualidad o cambiando el rumbo de la naturaleza untrimm’d;
Pero tu eterno verano no se marchitará
Tampoco pierde la posesión de ese owest eres justo;
Tampoco se aplicará la Muerte jactarse wander’st tú en su sombra,
Cuando en las líneas eterna a growest tú tiempo:
Mientras los hombres pueden respirar o los ojos pueden ver,
En tanto esta vida y esto le da vida a ti.

FAILURES AND PROBLEMS WITH THE TRANSLATOR

Due to the automatic nature of the translator, there is a group of errors which are repeated along the poem. When we are working with a translation tool, we have to keep in mind that it will never be the same as a human translation. In addition, it is not the same to translate a short text with easy grammatical structures and phrases than translating a long text with rhetorical figures. In this particular case, I have chosen a poem from the 15th– 16th century to prove the limitations that translators like Google Translate offer. We have to consider that languages evolve and this particular translator we are commenting on, was created in 2009 so the translation is going to be difficult. In the Basically the main mistakes made by the translator were:

Lack of translation of certain words. Probably the main reason to these mistakes was the “apostrophe”. In Spanish, “apostrophes” do not exist, so it is probable that MT is not able to recognize the words in Spanish, so it is impossible to translate them. We are talking for instance about words like “dimm´d”, “untrimm´d”, “wander’st”.

Word order. Long sentences tend to be translated in a wrong way. For example “And summer’s lease hath all too short a date” or “And every fair from fair sometime declines” which have been changed due to the leghtening of the sentences if we compare Spanish (long sentences) versus English (short sentences). In addition, we have to take in mind that the text translated is poetry so the work done by the translator is more difficult even.

Punctuation. This translator has not made many errors of punctuation. However, we can tell one which is meaningful. This is the case of the interrogative sign, which in English only appears one in opposition to Spanish which is used twice. For example, we have: “Shall I compare thee to a summer’s day? Which appears as “Voy a compararte con un día de verano?

Lack of natural language. Although the text translated is a literary text, the tone and the lexicon used is not appropriate, especially the word order.

The other text we have chosen is a journalistic text. We have decided to select this kind of text to compare the translation given by Google Translate according to the nature of texts. The text we have chosen is an article entitled:  from the magazine HELLO! written in English entitled: “Pippa Middleton unwinds with former friend in Madrid”. I have also provided the translated text and my own version:

PERIODISTIC TEXT

ORIGINAL TEXT

Pippa Middleton unwinds with former flame in Madrid

16 MAY 2011

After her starring role in the wedding of the century, it was a well-deserved break.

While her newly-married sister jetted off the to the Seychelles for her honeymoon, Pippa Middleton flew to Madrid for some downtime with a group of friends that included former flame George Percy, the heir to the Duke of Northumberland.

Escaping the media frenzy surrounding her since the big day, the brunette beauty relaxed in the capital’s biggest park – the Retiro.

And she took to the water for a boat trip with the eligible aristocrat, whom she dated whilst at Edinburgh University.

But it was not all rest for party girl Pippa, who was keen to sample some of the Spanish city’s nightlife.
She partied in one of the most exclusive nightclubs, Fortuny, where she was spotted with a bottle of champagne in her bag.

And there was no let-up for the party planner the following evening, as she enjoyed an evening at swanky bar The Penthouse, dressed to the nines in a beige peplum skirt and fitted black blazer.

When the 26-year-old emerged the next day, she sported sunglasses as she made her way around the tourist sights with the group.

Pippa and George were flatmates in Edinburgh whilst they were in a relationship, and the pair have been friends for nearly ten years.

Her current boyfriend, broker Alex Loudon, stayed in Britain during her trip.

TRANSLATED VERSION

 Pippa Middleton se desenvuelve con la llama antigua en Madrid
16 MAY 2011
 
Después de su papel protagónico en la boda del siglo, fue un bien merecido descanso.

Mientras que su hermana recién casados ​​de hidromasaje de la de las Seychelles para su luna de miel, Pippa Middleton viajó a Madrid para un tiempo de inactividad con un grupo de amigos que incluyó al ex la llama George Percy, el heredero del duque de Northumberland.
Escapar de la histeria mediática que rodea a su puesto el gran día, la belleza morena relajado en el mayor parque de la capital – el Retiro.

Y se llevó al agua para un viaje en barco con el aristócrata elegibles, a quien de fecha, mientras que la Universidad de Edimburgo.
Pero no todo fue descanso para chica fiestera Pippa, que estaba dispuesto a probar algunos de la vida nocturna de la ciudad española.

Ella fiesta en una de las discotecas más exclusivas, Fortuny, donde fue visto con una botella de champán en su bolso.

Y no había tregua para el planificador del partido la noche siguiente, mientras disfrutaba de una tarde en el bar chic El Penthouse, vestido de punta en blanco con una falda peplum beige y blazer negro puesto.

Cuando el jugador de 26 años de edad, salió el día siguiente, ella lucía gafas de sol como hizo su camino alrededor de los lugares de interés turístico con el grupo.

Pippa y George fueron compañeros de piso en Edimburgo, mientras se encontraban en una relación, y los dos han sido amigos durante casi diez años.
Su actual novio, el corredor Alex Loudon, una estancia en Gran Bretaña durante su viaje.

MY OWN VERSION

 Pippa Middleton se relaja en Madrid con su nuevo novio en Madrid

Mientras que su hermana recién casada volaba rumbo a las Seychelles para disfrutar de su luna de miel, Pippa Middleton volaba hacia Madrid por unos días de descanso con un grupo de amigos entre los que se incluía su antiguo novio, el heredero del ducado de Northumberland.

Tratando de escapar de la hysteria de los medios de comunicación que la perseguian desde el gran día, la belleza morena se relajó en el parque más grande de la capital, El Retiro.

Y se metió en el agua durante un viaje en barca con el aristócrata casadero con el que estuvo saliendo mientras estaba en la Universidad de Edimburgo.

Sin embargo, no todo fue descanso para la fiestera Pippa, quién estaba deseosa de disfrutar de la fiesta nocturna de España.

Disfrutó de la fiesta en uno de los clubs más exclusivas, llamado “Fortuny”, donde se la pudo ver con una botella de champán en su bolso.

Y no hubo ninguna interrupción para la planificadora de la fiesta a la tarde siguiente, ya que se divirtió en el bar pijo “The Penthouse” vestida de punta en blanco con una falda plisada beige y una americana negra que le quedaba como a un guante.

Al día siguiente, cuando la chica de 26 de años apareció llevaba puestas unas gafas de sol para disfrutar de las vistas con el resto del grupo.

Pippa y George fueron compañeros de piso en Edimburgo mientras que fueron novios y la pareja ha seguido siendo amiga desde hace diez años.

Su actual novio, el bróker Alex Loudon, se quedó en Gran Bretaña durante el viaje.

FAILURES AND PROBLEMS WITH THE TRANSLATOR

Lack of agreement. In the text translated there is a problem with agreement, maybe due to the fact, that in English there are no visible marks which express agreement in gender and number except in the third person singular present. Even though, in third person singular the “s” does not indicate if we are talking about a woman or a man.  We find many exampleslike “While her newly-married sister jetted off the to the Seychelles” translated as “Mientras que su hermana recién casados” o “the brunette beauty relaxed in the capital’s biggest park” translated as “la belleza morena relajado en el mayor parque de la ciudad”

Confusion between categories. The translator does not recognize words which can be used in two categories. For instance, verbs and nouns. That is the case of the word “party” in “she partied in one of the most exclusive nightclubs” translated as “ella fiesta en uno de los clubs nocturnes más exclusivos”

False friends. The translator has not made a good translation of some words like the adjective “eligible”. In English, the sense of the adjective eligible is ”somebody wealthy”, “a good person to marry with” . However, “eligible” in Spanish means “somebody who can be chosen”.

Problems with the translations of compounds and adjectives. The translator does not make a good job when it has to translate compounds. For instance, “party planner” which means “somebody who loves parties” is translated as “el jugador”.

Punctuation. In Spanish, “dashes” are only used when we want to reproduce a direct speech or a dialogue written. In English, they use a  dash in the place of a “comma” so maybe the translator has not been able to reproduce it. For instance, “the capital’s biggest park”- the Retiro” translated as “el mayor parque de la capital- el Retiro”

Lack of certain words: The translator has not been able to translate some words like “peplum”. In addition, proper nouns like “The Penthouse” or “Fortuny” do not appear translated

REVERSO

Reverso is an online free translator which can be only used to translate short texts.In the same webpage there is a dictionary and a conjugation tool. This translator tool uses Reverso Intrenet which has been developed by PROjectMT and Softtisimo. The Reverso translator is a very useful tool for instance, to look for words when somebody receives an e-mail instead of using dictionaries or to eliminate the barriers of language when we are abroad

Reverso online translator offers several language to work with.Basically the most used are the following combinations:  English- Spanish, Espanish- English, French- Spanish, Spanish- French, Spanish- German, German- Spanish, Portuguese – Spanish, English- Japanese,  Rusian- Spanish etc .

How to translate a short text?

 In order to translate we have to follow three different steps. Firstly, we have to  paste the text on the translation box and then we choose the language in which we want to translate the text. Once we have made our choice, we press the button “TRANSLATE” and wait until a new window is open with the translation.

TRANSLATED VERSION

SONETO 18

¿Compararé thee hasta el día de un verano?

Thou arte más encantador y más templado:              

Vientos ásperos realmente sacuden los brotes queridos de mayo,

Y el arriendo del verano hath todo una fecha demasiado corta:

Algún día demasiado caliente el ojo de brillos de cielo,

Y a menudo es su tez de oro dimm’d;

Y cada feria de la feria algún día disminuye,

Por casualidad o el curso de cambio de la naturaleza untrimm’d;

Pero el verano thy eterno no se descolorará

Ni pierda la posesión de aquella feria thou owest;

Tampoco la Muerte se jactará thou wander’st en su sombra,

Cuando en líneas eternas a tiempo thou growest:

Mientras que los hombres pueden respirar o los ojos pueden ver,

Tan vidas largas esto y esto dan la vida a thee.

FAILURES AND PROBLEMS WITH THE TRANSLATOR

 Personal Pronouns: Contrary to Google translate which has been able to translate pronouns from other century, Reverso has not translated personal pronouns. For instance, “thee” which means “you” in modern English, and “thou” in the following sentences: “Shall I compare thee with a summer’s day?” and “Thou art more lovely and more temperate” translated as ¿Compararé thee hasta un día de verano? And “Thou arte más encantador”.

Incorrect use of past subjunctive. This translator is not able to use the subjunctive in Spanish.In the poem there are several examples which show this problem, like for instance “So long as men can breathe or eyes can see” translated as “mientras que los hombres puedan respirar o los ojos puedan ver”.

Lack of agreement subject- verb. Reverso is not able to translate in a coherent way structures in which agreement is present. For instance, “So long lives this and this gives life to thee” translated as “tan vidas largas esto y esto dan la vida a thee”. “Esto” is a third person singular pronoun in Spanish so it should be followed by a verb in third person singular, not plural.

Problems with words which can be different categories.  Again we find problems with words which act as several categories like for instance “shine” which means “to glow”. In this case “the eye of heaven shines” is translated as “el ojo de brillos de cielo”. We have to assume that shines referring to the verb “to shine”

 TRANSLATED VERSION

Pippa Middleton desenrolla con la antigua llama en Madrid 16 MAYO 2011

Después de su papel estrellado en la boda del siglo, esto era una rotura bien merecida.

Mientras su hermana recién casada jetted del a las Seychelles para su luna de miel, Pippa Middleton voló a Madrid durante algún tiempo de inactividad con un grupo de los amigos que incluyeron la antigua llama Jorge Percy, el heredero del Duque de Northumberland.

Evitando el frenesí de medios de comunicación que la rodea desde el día grande, la belleza de morena relajada en el parque más grande de la capital – el Retiro.

Y ella tomó al agua para un viaje del barco con el aristócrata eligible, quien ella dató mientras en Edinburgo la Universidad.

Pero esto no era todo el resto para la muchacha de partido(parte) Pippa, que era penetrante para probar un poco de la vida nocturna de la ciudad española.

Ella celebró una fiesta en uno de los clubs de noche más exclusivos, Fortuny, donde ella fue manchada(descubierta) con una botella de champán en su bolsa.

Y no había ninguna calma para el planificador de partido(parte) la tarde siguiente, como ella disfrutó de una tarde en la barra de lujo

FAILURES AND PROBLEMS WITH THE TRANSLATOR

 Literal translation. The translator has used literal translation in some of the words as “date” which means “go out with somebody when you are in a relationship”. In Spanish, “datar” is accepted but now commonly used so it sounds a bit awkward in natural speech.

Use of brackets. This translator uses brackets to explain the meaning of some words which are not very clear. For instance, “she was spotted” translated as “ella fue manchada (descubierta) or “and there was not let-up for the party planner” as “y no había ninguna calma para el planificador de partido (parte).

Context of translation. The translator is not able to select the appropriate meaning of a certain word. It is necessary that a human brain participates in the process of translation, In this text, the word “flame” is used but not with the meaning of “llama” but with the definition of “boyfriend” or “lover”.  In this text, the mistake is made in “Pippa Middleton unwinds with former flame in Madrid” as “Pippa Middleton se relaja en Madrid con su nuevo novio”

Confusing translations and change of meaning. This translator tends to change the meaning of sentences and the tone of the context of the text. For instance, in the sentence “after her starring role” where “starring” means something positive, “somebody who has been seen as an important figure” is translated as “estrellado” which means quite the opposite, that is to say, “somebody who is not lucky”.

Misinterpretations of common expressions. In all languages there are certain expressions which are constructed in a particular way depending on their origin. For example, in the this text, the expression “and she took to the water for a boat trip” which means that “somebody has sailed in a boat” has been translated as “y ella tomo el agua” which in Spanish means to “drink water”

Yahoo Bable Fish

This translator is an online language resource which allows to translate short texts (150 characters) and web pages. SYSTRAN is the technologycal system upon which Yahoo Babel Fish is built. The languages of translation are for instance: French, German, Italian, English, Japanese etc

We select the languages we want to work with and then we paste the text in the box. Then we press the button “translate”.

TRANSLATED VERSION

Pippa Middleton desenrolla con la llama anterior en Madrid 16 de mayo de 2011 Después de su papel starring en la boda del siglo, era una rotura merecida. Mientras que su hermana nuevo-casada echó en chorro de a las Seychelles para su luna de miel, Pippa Middleton voló a Madrid para un cierto tiempo muerto con un grupo de amigos que incluyeron la llama anterior George Percy, el heredero al duque de Northumberland. Escapando el frenesí de los medios que la rodeaba desde el día grande, la belleza triguena se relajó en el capital’ el parque más grande de s – el Retiro. Y ella llevó el agua para un viaje del barco con el aristócrata elegible, que ella fechó mientras que en la universidad de Edimburgo. Pero no era todo el resto para la muchacha de partido Pippa, que era afilada muestrear algo del city’ español; vida nocturna de s. Ella partied en uno de los clubs nocturnos más exclusivos, Fortuny, donde la mancharon con una botella de champán en su bolso. Y no había descanso para el planificador del partido la tarde siguiente, pues ella disfrutó de una tarde en la barra ostentosa el ático, vestido a los nines en una falda amarillenta del peplum y una chaqueta negra cabida. Cuando emergieron los 26 años el next day, ella se divirtió las gafas de sol mientras que ella hizo su manera alrededor de las vistas turísticas con el grupo. Pippa y George eran flatmates en Edimburgo mientras que estaban en una relación, y los pares han sido amigos por casi diez años. Su novio actual, corredor Alex Loudon, permanecía en Gran Bretaña durante su viaje.

FAILURES AND PROBLEMS WITH THE TRANSLATOR

Lack of division into paragraphs. The translator makes its job as a block. The original text was divided into lines or small paragraphs but the translated versions is translated together.

Problems with the Saxon genitive. This translator is unable to translate the saxon genitive which is typical of English language in to Spanish. As a consequence, the text has not translated the following expression right: “some of the Spanish city’s nightlife”. The result has been “el capital parque más grande de s- el Retiro”.

Inappropriate translation of verbs. This translator has not done a good job because it has translated “she was spotted with a bottle of champagne in her bag”  which mean that “she was observed or seen while she had a bottle in her bag as “ donde la mancharon con una botella de champán” which means “to get dirty”

False friends. The same as Google Translate and Reverso translator has not made a good translation of some words like the adjective “eligible”. In English, the sense of the adjective eligible is ”somebody wealthy”, “a good person to marry with” . However, “eligible” in Spanish means “somebody who can be chosen”.

Lack of translations of certain words. The translator does not work with certain words as “the next day” which should be translated as “el próximo día” o “flatmates” as “compañeros de piso”.

TRANSLATED VERSION

SONETO 18 Compararé thee a un summer’ ¿día de s? Arte de mil más encantador y más templado: Los vientos ásperos sacuden los brotes queridos de mayo, Y summer’ hath del arriendo de s toda la fecha demasiado corta: Alguna vez demasiado caliente el ojo del cielo brilla, Y está a menudo su tez dimm’ del oro; d; Y cada feria de la feria de declinaciones alguna vez, Por casualidad o nature’ curso cambiante untrimm’ de s; d; Pero thy verano eterno no se descolorará Ni pierda la posesión de ese mil justo más owest; Ni mil wander’ del brag de la muerte; st en su cortina, cuando en las líneas eternas para medir el tiempo de mil más growest: Siempre y cuando los hombres pueden respirar o los ojos pueden ver, Tan las largas vidas esto y éste da vida al thee.

FAILURES AND PROBLEMS WITH THE TRANSLATOR

Problems with the Saxon genitive. Again as in the previous example “Shall I compare thee to a summer’s day?” as “Compararé thee a un summer ¿Día de s?.

Personal Pronouns: Contrary to Google translate which has been able to translate pronouns from other century, Reverso has not translated personal pronouns. For instance, “thee” which means “you” in modern English, “thou” and “thy”

Confusion of word categories. This translator has a problem with the translation of archaic forms like the verb “art” which in the 16th century was used as the 2nd person singular/plural in “Thou art more lovely and more temperate” because it has been translated as the noun “arte” in “Arte de mil encantador”

I have done an slideshare presentation with images which explains better my analysis:

IN CONCLUSION

Although, I am not an expert in the field of translations I think the three translators I have mentioned before offer great advantages and disadvantages. Undoubtedly, each of the translators is an automatic linguistic tool so we assume that they will probably be imperfect. As we have seen in the examples of the review, all of them show a lot of mistakes which can only be improved by the human brain. The object of our analysis, that is Google Translate, makes a lot of mistakes because it does not apply grammatical rules and change words which seem to be equivalent but they have a different meaning. However, if I had to choose one of the translators we have talked about, I think Google translator is the best. Basically, because the translation which offers can be understood as the mistakes made do not prevent from catch the general meaning. We have analysed both texts and the majority of problematic errors appeared in Yahoo Babel Fish and Reverso. These kinds of errors have made almost impossible for a person who does not have a sound knowledge of English to understand the version which came from the original text. This review shows why I have to this conclusion, with all the examples I have used

SOURCES

Google Translate

On the following lines, it will be analyzed one of the most succesful translators of this century-Google Translate. This translator is a free on-line statistical machine service owned by Google Inc that translates immediately a lot of different languages (57) such as Polish, German, Dutch, Spanish… However, it has to be said that some languages are better translated than others, in other words, some languages are supported by Google translate and others languages are called by the company “alpha languages”, this is to say that these languages have lower quality in their translations.

It is possible to translate long texts, but the system limits the number of paragraphs. Nevertheless, if the user wants to translate completely a website, Google Translate gives him or her the opportunity to use Google chrome which is a fast free browser that translates websites automatically in many languages. Not only does Google translate give you the opportunity to use Google chrome, but also other tools such as to the Google translated search (the information that you are searching probably will not be in your own language; the system searches the best contribution and translates it to your own language) or the iphone version which allows voice input.

The aim of this enterprise is “to make information universally accessible, regardless of the language in which it is written” That is why it has been improving since it started. Nowadays, it can be done many things that could not be done at the beginning. For example, in the first version, only English could be translated to some other languages, now it can be done the other way round. Moreover, it is also possible to have the romanization written for languages such as Chinese or Greek and, in the last version launched in January 2011, it is also possible to see different possible translations for a specific word. A good way that helps this translator to improve is that the user himself can increase the quality of translations by suggesting improvements or uploading his translations memories into Google Translate’s Translator Toolkit. Furthermore, the service itself asks the user sometimes alternate translations for technical terms.

But, how does this translator work? As it has been said, Google Translate is a Statistical Machine Translator (SMT) which is a way of translating texts completely different from the traditional rule-based translations. The rule-based machine translations were used some years ago and they applied the rules and grammars of the language that was being translated. However, Linguists knew that not all languages had the same rules (e.g the order of some languages is subject- verb-object but in others is verb- subject-object) that is why the translations were not very good.

Then, it began statistical machine translations where the computer looks for patterns in millions of documents. This documents had already been translated by human beings and thanks to them the computer can know more or less how the translation should be. However, the translations are not always perfect and the quality of them depends mainly on the number of documents that the computer can analyze to see patterns. That is why Google Translate can translate better, for example, German than Basque, it has more German documents than Basque Documents. Franz Josef Och is the main head in Google and he is in favour of Statistical machine translators. The documents that are available for the machine are taken from United Nations documents.

Finally, this way to translate texts has advantages. For instance, the quality is better than in rule-based translations, also, the translations are more natural and we have better use of resources. But, there are some disadvantages and problems with: sentence alignment, different word orders, compound words, idioms, morphology

Do not hesitate to see the following video that explains how SMT works . If you are interested in knowing more about the problems Google Translate has, you can see the portfolio I did commenting the main problems here: http://wiki.littera.deusto.es/en/index.php/User:1adcaden/trans0910/Portfolio


References:

Itzulpengintza Automatikoa, TRADUKKA

Tradukka.com, denbora errealean testuen itzulpenak egiteko sare itzultzailea da. Kontzeptua, gainontzeko itzulpen zerbitzuen bezala dago garaturik, baina TRADUKKAren kasuan idatzi ta bereala ikus dezakegu aukeratu dugun hizkuntzaren itzulpena pantailan, hitzez hitz. 

 

PROIEKTUA

Tradukka, Andres Santosek (Egobits-en fundatzailea) garatutako proiektua da. Funtzionatzen jarri eta hiru hilabetera 2 milioi erabiltzaile lortu zituen hilero, gehiengoak Brasil, Portugal eta Israelekoak. Gauregungo irabazbidea ordainpeko itzultzaileen publizitatea da, baina ordainketa bertsio bat prestatzen ari dira erabiltzaileentzako funtzio gehigarriekin.

Tradukkak Google itzultzailearen APIa erabiltzen du eta ondorengo abantailak eskaintzen ditu Googelen serbitzuarekin konparatuta eta horretan datza arrakasta, bere hitzetan: “es un .com directo, a diferencia de Google Translate o Yahoo! Babelfish que muchas veces hay que buscarlos por su nombre para ir a la herramienta; facilita escoger un idioma base de la interfaz in situ y esa configuración queda grabada en el ordenador del usuario; interfaz gráfica amplia y limpia; usabilidad fluida que hace que practicamente no requiere uso del ratón; envio de traducciones por email o a redes sociales; y opcion de compartir traducciones mediante links con bit.ly”

ERABILERA

Web-orri hau sabaltzean ingeleserako itzulpena agertuko zaigu beti, baina aldatzeko aukera dago goikaldeko koadroetako gezia sakatuz. Online den instrumentu honen erabilera doan da, ez dago zertan erregistro formulariorik bete beharrik etagainera itzulpenak gauzatzea gauza simple eta erraza da.

Itzultzaile honek hainbat hizkuntz erabiltzeko aukera eskaintzen digu, ez bakarrik gasteleratik ingelesera edo gastelera beste hizkuntz batzutara. Idatziko dugun testuaren hizkuntza zehaztu dezakegu eta era berean ze beste hizkuntzera itzuliko dugun; tamalez ez zaigu euskarara itzultzeko aukera ageri. Arestian aipatu dudan bezala, serbitzu honek itzulpena denbora errealean emateak, jartzen du beste programa askoren gainetik.

Esanako erabilpen metodo horretaz gain, hau da, norberak erredaktatzeaz gain, beste horrialdeetako testuak kopiatu eta itsatsiz, sistemak automatikoki hizkuntza antzeman eta aukeratutakora itzuliko du. Online itzultzaile honek doan eskaintzen duen serbitzuak beste funtzio batzuk ere gehitzen ditu, testu kutxa bien (idazten dugun eta itzulpena agertzen den koadroak) kokaguneak alda ditzakegu, hau da, itzuli dugun testua erredaktatu dugun aldera igaro eta berriz ere hasierako hizkuntzara aldatzeko.

Beste funtzio batzuek erraminta hau posta elektroniko, Facebook, Twitter, Google Reader edo Delicious bidez lagunekin elkarbanatzeko aukera ematen digute.

 

Tradukkaren abantailik esanguratsuenak, testua itzultzean prozesatzen duen abiadura eta erabilgarri ditugun hizkuntzen kantitatea dira dudarik gabe. Itzultzaile honek laguntza atal bat ere badu, aipamen zein iradokizunak bidali ditzakegu kontaktu formularioari esker. Eguneratzeen berri ere izan dezakegu twitterren duen perfil ofiziala begiratuz.

SAIAKERAK 

1.) Itzulpena

 2.) Trukaketa

Ikus daitekeenez, itzulpenak kalitate aldetik maila altua adierazten dute bai gramatika eta bai lexiko aldetik. Hala ere, programak eskaintzen digun trukaketa erabiliz, esaldien egitura aldatzen da baina ez du ulergarritasuna galtzen. Kasu batzutan hobekuntza bat nabari daiteke, esaldia erabilera landuago batera aldatuz.

ESTEKAK

Tradukka. Itzulpen Automatikoa. Bildua: maiatzak 6, 2011. Orrialdea:   http://tradukka.com/
Machine Translation (2008). Wikipedia, Entziklopedia Askea. Bildua: maiatzak 6, 2011. Orrialdea: http://en.wikipedia.org/wiki/Machine_translation

COCA-Corpus of Contemporary American English

Nowadays, students of foreign languages, teachers or linguists have many tools available for learning new languages or improving their knowledge of that specific language they are studying. However, many people do not know of the existence of these tools and they cannot take advantage of them. Students can use translators, dictionaries, grammars… One tool that can be very useful when studying a language at a high level and how this language is structured is corpus linguistics. On the following lines, it will be described what is corpus linguistics and one specific corpus that has become very popular. This corpus is called The Corpus Of Contemporary American English (COCA) made by the important professor of Corpus Linguistics Mark Davies at Brighman Young University.

For instance, What do we understand by Corpus linguistics? The definition by Wikipedia is the following:

Corpus Linguistics is the study of language as expressed in samples (corpora) or “real world” text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely derived by an automated process.

At first sight, it can seem that Corpus Linguistics is better to the study of a language rather than grammars because in Corpus samples we have how the language is really used by native speakers. However, this system can also have some disadvantages. For example, as Noam Chomsky said, real language is also riddled with performance-related errors and that is why it is needed careful analysis of small speech samples , but this is not included in Corpus Linguistics because Linguists only include big examples. Nevertheless, this field has been improving and,nowadays, we have very good Corpus which include may samples and very well structured. One Corpus that has to be mentioned is the COCA one.

The Corpus Of Contemporary American English is a free on-line corpus that has 425 million words and 160,000 different texts that come from a variety of sources and genres. It is the largest corpus of American English currently available.Moreover, it has been including 20 million words each year since 1990. More than 40,000 users visit this corpus each month. The different genres or sources are, firstly, spoken (85 million words) from 150 TV and radio programmes.Secondly, fiction (81 million words) from short stories and plays and, then, popular magazines (86 million words), newspapers (81 million words) and academic journals (81 million words). Furthermore, users can search the frequency of a word in each genre which help us to know, for example, if a word is used in academic writing or not. It is also possible to compare how the use of certain words has changed over time from 1990 to present time and to ignore one specific genre when we think that it is not going to be useful.

But, why is this Corpus so good? There are many reasons. For instance, researches of this corpus have been working many years to improve this corpus and their work is also connected to other important Corpus such as the British International Corpus, Time Corpus or the Corpus of Historical American English (COHA). There are also updates with new words from time to time; the last one has been in 2011. Users can search many things within the interface. For example, exact words (e.g: mysterious), part of speech, lemmas which are all the forms of a word (e.g: sing which is the base can have many forms such as singer, song, singing…), wildcard which is an option that gives you the system when you do not know exactly how a word is written( e.g: un*ly; the system’s answers would be unlikely, unusually…) It is also possible to search for collocates within a ten-word window (e.g. all nouns somewhere near faint, all adjectives near woman, or all verbs near feelings)

Other good points are: the possibility to compare the collocates of two related words (e.g banana and apple or little and small; thanks to this we can know the difference in meaning of these words and how each word is used) , to find the frequency and distribution of synonyms for nearly 60,000 words and that we can create our own list of related words.

Take the following example that illustrates how this interface works. In this case, we will analyze the collocates that precede the nouns apple and banana.In the first chart, we can see the answers for apple. It can be seen that there are many times that apple is preceded by an article such as the or an.

WORD 1 (W1): APPLE (3.95)

  WORD W1 W2 W1/W2 SCORE
1 THE 1648 445 3.7 0.9
2 AN 1325 0 2,650.0 671.6

However, banana has less cases. It could be said that apple takes normally determiners and banana not.

WORD 2 (W2): BANANA(0.25)

  WORD W2 W1 W2/W1 SCORE
1 A 602 8 75.3 296.9
2 THE 445 1648 0.3 1.1

Finally, it has to be said that if you use many times this interface, you will have to Log in. Do not hesitate to use this corpus and find attach here a video done by the Emerald Cultural Institute that shows very well how to use COCA .

References:

Machine Translation: Google Translator

“Machine translation, sometimes referred to by the abbreviation MT, also called computer-aided translation, machine-aided human translation MAHT and interactive translation, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.” The aim of the Mt is to perform simple substitutions of words in one natural language for words in another. But only that is not possible for a good translation of a text. The reason is that the recognition of whole phrases and their closest counterparts in the target language is needed. To solve this problem nowadays corpus and statistical techniques is starting to be used, with this translation will improve. Current machine translation software often allows for customisation by domain or profession (such as weather reports), improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is used. It follows that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text. Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has unambiguously identified which words in the text are names. With the assistance of these techniques, MT has proven useful as a tool to assist human translators and, in a very limited number of cases, can even produce output that can be used as is (e.g., weather reports). The progress and potential of machine translation has been debated much through its history. Since the 1950s, a number of scholars have questioned the possibility of achieving fully automatic machine translation of high quality.[1] Some critics claim that there are in-principle obstacles to automatizing the translation process. We have been using the google trasnlator for the translation class, and that is why I decided to write this post on it: Google says about their blog that: At Google, we consider translation a key part of making information universally accessible to everyone around the world. While we think Google translate, our automatic translation system, is pretty neat, sometimes machine translation could use a human touch. Yesterday, we launched Google Translator Toolkit, a powerful but easy-to-use editor that enables translators to bring that human touch to machine translation. For example, if an Arabic-speaking reader wants to translate a Wikipedia™ article into Arabic, she loads the article into Translator Toolkit, corrects the automatic translation, and clicks publish. By using Translator Toolkit’s bag of tools — translation search, bilingual dictionaries, and ratings, she translates and publishes the article faster and better into Arabic. The Translator Toolkit is integrated with Wikipedia, making it easy to publish translated articles. Best of all, our automatic translation system “learns” from her corrections, creating a virtuous cycle that can help translate content into 47 languages, or over 98% of the world’s Internet population. Taking that class on translation, one realises that it is one of the best machine translator (at least for English) but of course a knowleadge on the language you are working is needed. But why? Well, Altough it gives a general idea of the translation itself it is necessary to correct the text because the tranaltor is not able to recognize the subejct and sometimes it even confuses some terms. So if you have an idea of the language it is easy to correct them, but if not, if you are not able to see the mistakes the text is quite a disaster. Apart from that I have to say that using the translator is a firt step to end up with a profesional translator. It gives you the general idea, and from that it is possible to improve it. Resources:

Itzultzaile automatikoak (erdaratu.eu)

Naroa Perez eta Esti Blanco

Guk aztertuko dugun itzultzaile automatikoak “Erdaratu.eu” izena dauka. Orain dela gutxi sareratu zen eta apertium-eu-es sistema erabiltzen du, hortaz, euskaratik espainierarako Apertium plataforman oinarritutako itzultzaile automatiko librearen lehenengo bertsioa dugu: eta erdaratu.eu webgunean probatu daiteke.

Lehenik eta behin, proiektuari buruzko informazio orokorra eskainiko dugu, gure ustetan garrantzitsua baita honen funtzionamendua ulertzeko. Apertium-eu-es itzulpen automatiko proiektu handiago baten parte da: Apertium (www.apertium.org ) izeneko proiektuaren parte, hain zuzen. Apertium proiektuak, hizkuntza batetik bestera testuak itzultzeko softwarea garatzen du.

Azken hilabeteetan, Mireia Ginest aritu da Apertium-eu-es garatzen, Alacanteko Unibertsitateak, eta, batez ere, Prompsit Language Engineering (www.prompsit.com) enpresak finantzatuta. Prompsit-eko Sergio Ortizek eta Francis Tyersek eta Alacanteko Unibertsitateko Mikel Forcadak lagundu diote. Une honetan, apertium-eu-es itzulpen automatikoak gutxi gora behera 6.000 hitz eta 250 gramatika-arau dauzka. Hiztegiaren zati handi bat beste proiektu libre batetik hartua da, Matxin espainiera-euskara itzultzaile-prototipotik. Erregela guztiak Apertiumen idatzi dira.

Apertium-eu-es testu baten esanahi orokorrari buruzko ideia bat egiteko balio dezake (baldin eta sistemak hitz ezezagun gehiegi aurkitzen ez badu testuan).

Behin informazio hau eskaini eta gero, guk egindako proba ezberdinen berri emango dugu. Guk zenbait esaldi itzultzen saiatu ginen emaitzak zelangoak ziren ikusteko eta ondorengo emaitzak ikusi genituen:

– “nik ez dut pentsatzen bihar eguraldi ona egingo duenik”

– “Yo no pienso mañana el tiempo bueno hará que”

Hauek dira itzultzaileak ematen dituen emaitzak. Hitz ordena nabari ez zuzena dela jarraian ikus dezakegu, beraz, pentsa genezake sintaxiaren arazo bat daukala. “Pentsatzen” hitza “uste”rengatik aldatuz gero (nik ez dut uste bihar eguraldi ona egingo duenik) jartzen badugu hain zuzen ere, emaitzaren hitz ordena berdina izaten jarraituko du:

– Yo no tengo suposición mañana el tiempo bueno hará que

Eta “nik” beharrean “zuk” jarriz gero ere, emaitza berdina izango da baina “Tú no tienes suposición mañana el tiempo bueno hará que” jarriko du. Beraz, adibide honetako gure ondorioa da itzultzaileak badakiela izenordainak bereizten (ni eta zu bereizten baititu itzulpenean) eta bada gai era berean “pentsatzen” eta “uste” artean desberdintzeko, baina bi arazo daude:

1- Batetik itzultzaileak ez du ondo harrapatzen “uste”  hitzaren zentzua. Berak “tener suposición de” bezala itzultzen du eta hiztunarentzako “creer” zentzua dauka kasu horretan.

2- Hitz orden erratua. Hitzak gaztelerara ez ditu dagozkion lekuan kokatzen. Elementuak dauden horretan kokatzen ditu, ordena euskeraz ondorengoa da:

Izord (zuk) + ez + adlag (duzu) + adnag (uste) + denbora adond (bihar)+ izena (eguraldi) + izlag (ona) + adnag (egingo) + adilag (duenik).

eta gazteleraz ere, orden hori mantentzen du, baina gaztelerazko hurrenkera errespetatu barik hortaz.

Izord (tú) + no + adlag (tienes) + adnag (suposición) + denbora adond (mañana) + izena (tiempo) + izlag (buen) + adnag (hará)

Beste adibide bat jarriko dugu lehenengo bidetik jarraituz.

“Zuk uste duzu bihar eguraldi ona egingo duela?” Hau jarritakoan, lehenengo esaldia galdera bihurtu dugu eta itzulpena honakoa izan da: “Tú crees mañana el tiempo bueno hará ,?”. Adibide honetatik atera dezakegun ondorioa da “uste + izan” aditz osoa jarriz gero itzultzaileak harrapatzen duela aditz baten aurrean gaudela (creer) eta ez izen baten aurrean aurreko kasuan bezala (suposición), beraz, zerbait hobetu da egoera. Gainontzeko elementuek aldiz, orden ez egoki horretan mantentzen dira.

Gure ustetan, aurretiaz azaldutako guztia dela eta, itzultzaile hau erabilgarria izan daiteke batez ere euskara ikasten ari diren pertsonentzat, testuari zertaz ari den antzemateko, zehazki aztertu baino lehen eta euskaraz ez dakitenentzat, nahi izanez gero euskarazko komunikabideen albisteak ulertu ahal izateko

Apertium proiektua irekia denez (Apertium-en garatzen den software guztia GPL lizentzia librea du), era askotan lagundu dezakegu Apertium-eu-es hobetzeko. Honaino egindako aurrerapenei buruzko iruzkinak egiteko aukera dago eta bertsio berriak probatzen ere lagun diezaiekegu. Era beran, hiztegietan hitzak sar daitezke, eta gramatika-erregelak sortzen lagundu, irteeran hitz-ordena zuzenagoa lortzeko. Gure ustetan, hau da batez ere, itzultzaile honek eskaintzen duen abantailarik handiena, itxi ez denez etengabe hobetzen joateko aukera dago eta gainera, erabiltzaileari parte hartzeko eta amaierako produktuaren parte izateko aukera eskaintzen ere bai. Hau da gure ustetan etorkizuna izan beharko lukeena, software librearen eta jendearen asmo onez eta ekarpenez funtzionatuko duten proiektuak.