Reviews and Comments on Paper 65

Paper information

Paper #65: Felipe Sánchez-Martínez, Juan Antonio Pérez-Ortiz and Mikel L. Forcada. Speeding up target-language driven part-of-speech tagger training for machine translation
Abstract: When unsupervisedly training hidden-Markov-model-based part-of-speech (PoS) taggers involved in machine translation systems the use of target-language information has proven to give better results than the standard Baum-Welch algorithm. The target-language-driven training algorithm proceeds by translating every possible PoS tag sequence resulting from the disambiguation of the words in each source-language text segment into the target language, and using a target-language model to estimate the likelihood of each possible disambiguation. The main disadvantage of this method is that the number of translations to perform grows exponentially with the segment length, translation being the most time-consuming task. In this paper, we present a method that uses a priori knowledge obtained in an unsupervised manner to prune unlikely disambiguations in each text segment; therefore, reducing the number of translations to be performed during training. The experimental results show that this new pruning method drastically reduces the amount of translations done during training (and, consequently, the time complexity of the algorithm) without degrading the tagging accuracy achieved.
(file)

Summary of received reviews and comments

Reviews superseded by other reviews are shown in the grey color in the table.

        confidence score
Review 1       2 2
Review 2       2 3
           
Review 3       2 2
 
   


Reviews and Comments

Review 1

PC member: Luis Alberto Pineda
Overall rating: 2 (accept: I will argue for this paper)
Confidence: 2
Relevance: Is this paper relevant for this conference? 2 (accept (I will argue for this paper))
Soundness: Is this paper technically sound and complete? 2 (accept (I will argue for this paper))
Are the claims sufficiently supported by experimental/theoretical results? 2 (accept (I will argue for this paper))
Significance: Are the results/ideas interesting for other AI researchers? 2 (accept (I will argue for this paper))
Originality: Are the results or ideas novel and previously unknown? 1 (weak accept (vote accept but don't mind rejecting))
Readability: Is the paper well-organized and easy to understand? 2 (accept (I will argue for this paper))
Language: Is the paper written in correct English and style? 2 (accept (I will argue for this paper))
Format: Is the paper correctly and consistently formatted? 2 (accept (I will argue for this paper))
Review: CONTRIBUTION OF THE PAPER:

A method for machine translation using HMM in which information from both the source and target langauge is employed is presented. A heursitic to prune unlikely translations, and improve the HMM parameters using unttaged information from the source language in an unsupervised way is also presented.

POSITIVE ASPECTS:

Nice paper

NEGATIVE ASPECTS:

The paper is a bit dense

CHANGES TO IMPROVE THE PAPER:

For the audience and with the available space, it would be productive to highlight the intuitive idea

FURTHER COMMENTS:



ITEMS BELOW ARE JUSTIFICATION OF THE SCORES IF NEGATIVE:

(1) IS THIS PAPER RELEVANT FOR THIS CONFERENCE?

Yes

(2) IS THIS PAPER TECHNICALLY SOUND AND COMPLETE?

It seems it is, but I had no time to check in detail.

(3) ARE THE CLAIMS SUFFICIENTLY SUPPORTED BY EXPERIMENTAL OR THEORETICAL RESULTS?

they seem to be


(4) ARE THE RESULTS/IDEAS INTERESTING FOR OTHER AI RESEARCHERS?

Yes, in the area of machine translation

(5) ARE THE RESULTS OR IDEAS NOVEL AND PREVIOUSLY UNKNOWN?

It seems that the idea is novel

(6) IS THE PAPER WELL-ORGANIZED AND EASY TO UNDERSTAND?

Yes

(7) IS THE PAPER WRITTEN IN CORRECT ENGLISH AND STYLE?

Yes

(8) IS THE PAPER CORRECTLY AND CONSISTENTLY FORMATTED?

Yes
PC only:  
Time: Jun 21, 03:25

Review 2

PC member:  
Overall rating: 3 (strong accept)
Confidence: 2
Relevance: Is this paper relevant for this conference? 2 (accept (I will argue for this paper))
Soundness: Is this paper technically sound and complete? 1 (weak accept (vote accept but don't mind rejecting))
Are the claims sufficiently supported by experimental/theoretical results? 2 (accept (I will argue for this paper))
Significance: Are the results/ideas interesting for other AI researchers? 2 (accept (I will argue for this paper))
Originality: Are the results or ideas novel and previously unknown? 1 (weak accept (vote accept but don't mind rejecting))
Readability: Is the paper well-organized and easy to understand? 2 (accept (I will argue for this paper))
Language: Is the paper written in correct English and style? 2 (accept (I will argue for this paper))
Format: Is the paper correctly and consistently formatted? 2 (accept (I will argue for this paper))
Review: CONTRIBUTION OF THE PAPER:

Authors present a method that uses a priori knowledge obtained in an unsupervised manner to prune unlikely disambiguations in each text segment. The goal is to reduce the number of translations to be performed during training.

POSITIVE ASPECTS:

The paper is, essetially, a practice one. It provides a well structured organization, easy to read even for non-expert users. A complete experimental section is also included.

NEGATIVE ASPECTS:

Authors do not explain if comparative tests take into account tools where disambiguation is not based on HMMs (apparently it is not the case).

CHANGES TO IMPROVE THE PAPER:

I think that the paper is well balanced.

FURTHER COMMENTS:




ITEMS BELOW ARE JUSTIFICATION OF THE SCORES IF NEGATIVE:

(1) IS THIS PAPER RELEVANT FOR THIS CONFERENCE?



(2) IS THIS PAPER TECHNICALLY SOUND AND COMPLETE?



(3) ARE THE CLAIMS SUFFICIENTLY SUPPORTED BY EXPERIMENTAL OR THEORETICAL RESULTS?



(4) ARE THE RESULTS/IDEAS INTERESTING FOR OTHER AI RESEARCHERS?



(5) ARE THE RESULTS OR IDEAS NOVEL AND PREVIOUSLY UNKNOWN?



(6) IS THE PAPER WELL-ORGANIZED AND EASY TO UNDERSTAND?



(7) IS THE PAPER WRITTEN IN CORRECT ENGLISH AND STYLE?



(8) IS THE PAPER CORRECTLY AND CONSISTENTLY FORMATTED?
PC only: I think that the paper should be accepted.
Time: Jul 3, 11:18

Review 3

PC member:  
Reviewer:  
Overall rating: 2 (accept: I will argue for this paper)
Confidence: 2
Relevance: Is this paper relevant for this conference? 2 (accept (I will argue for this paper))
Soundness: Is this paper technically sound and complete? 2 (accept (I will argue for this paper))
Are the claims sufficiently supported by experimental/theoretical results? 2 (accept (I will argue for this paper))
Significance: Are the results/ideas interesting for other AI researchers? 2 (accept (I will argue for this paper))
Originality: Are the results or ideas novel and previously unknown? 2 (accept (I will argue for this paper))
Readability: Is the paper well-organized and easy to understand? 3 (strong accept)
Language: Is the paper written in correct English and style? 2 (accept (I will argue for this paper))
Format: Is the paper correctly and consistently formatted? 2 (accept (I will argue for this paper))
Review: CONTRIBUTION OF THE PAPER:

The authors propose a new and simple technics of pruning of
disambiguation paths while training hidden-Markov-model-based
PoS taggers using a priory knowledge. They consider the previously
proposed method of unsupervised training PoS-taggers for sourse
language in machine translation using a probabilistic model of
the target language to estimate the maximal likehood of possible
disambiguation paths. This method suffers from exponential
explosion of the space of disambiguation paths. Their pruning method
consists, for given a priori likehoods, in taking in account only
those paths whose mass probability does not exceed a threshold.
The a priori likehoods are periodically updated using the counts
collected from the target language. The experiments for Spanish as
the source language and Catalan as the target language have shown
that this method substantially decreases the number of relevant
disambiguation paths (and so the training time) even slightly
amelioratiing the accuracy of tagging.

POSITIVE ASPECTS:

The proposed method is simple and efficient

NEGATIVE ASPECTS:

It is tested for only one pair of languages and a simplified
weighting function for mixing the initial and the recounted HMM.


CHANGES TO IMPROVE THE PAPER:



FURTHER COMMENTS:



ITEMS BELOW ARE JUSTIFICATION OF THE SCORES IF NEGATIVE:

(1) IS THIS PAPER RELEVANT FOR THIS CONFERENCE?

yes

(2) IS THIS PAPER TECHNICALLY SOUND AND COMPLETE?

yes

(3) ARE THE CLAIMS SUFFICIENTLY SUPPORTED BY EXPERIMENTAL OR THEORETICAL RESULTS?

yes, but see my remark above

(4) ARE THE RESULTS/IDEAS INTERESTING FOR OTHER AI RESEARCHERS?

I believe, yes

(5) ARE THE RESULTS OR IDEAS NOVEL AND PREVIOUSLY UNKNOWN?

yes

(6) IS THE PAPER WELL-ORGANIZED AND EASY TO UNDERSTAND?

yes

(7) IS THE PAPER WRITTEN IN CORRECT ENGLISH AND STYLE?

yes

(8) IS THE PAPER CORRECTLY AND CONSISTENTLY FORMATTED?


yes
PC only:  
Time: Jul 14, 10:10