Reviews and Comments on Paper 65
Paper information
| Paper #65: Felipe Sánchez-Martínez, Juan Antonio Pérez-Ortiz and Mikel L. Forcada. Speeding up target-language driven part-of-speech tagger training for machine translation |
| Abstract: When unsupervisedly training hidden-Markov-model-based part-of-speech (PoS) taggers involved in machine translation systems the use of target-language information has proven to give better results than the standard Baum-Welch algorithm. The target-language-driven training algorithm proceeds by translating every possible PoS tag sequence resulting from the disambiguation of the words in each source-language text segment into the target language, and using a target-language model to estimate the likelihood of each possible disambiguation. The main disadvantage of this method is that the number of translations to perform grows exponentially with the segment length, translation being the most time-consuming task. In this paper, we present a method that uses a priori knowledge obtained in an unsupervised manner to prune unlikely disambiguations in each text segment; therefore, reducing the number of translations to be performed during training. The experimental results show that this new pruning method drastically reduces the amount of translations done during training (and, consequently, the time complexity of the algorithm) without degrading the tagging accuracy achieved. (file) |
Summary of received reviews and comments
Reviews superseded by other reviews are shown in the grey color in the table.
| confidence | score | ||||
| Review 1 | 2 | 2 | |||
| Review 2 | 2 | 3 | |||
| Review 3 | 2 | 2 | |||
Reviews and Comments
Review 1
| PC member: | Luis Alberto Pineda |
| Overall rating: | 2 (accept: I will argue for this paper) |
| Confidence: | 2 |
| Relevance: Is this paper relevant for this conference? | 2 (accept (I will argue for this paper)) |
| Soundness: Is this paper technically sound and complete? | 2 (accept (I will argue for this paper)) |
| Are the claims sufficiently supported by experimental/theoretical results? | 2 (accept (I will argue for this paper)) |
| Significance: Are the results/ideas interesting for other AI researchers? | 2 (accept (I will argue for this paper)) |
| Originality: Are the results or ideas novel and previously unknown? | 1 (weak accept (vote accept but don't mind rejecting)) |
| Readability: Is the paper well-organized and easy to understand? | 2 (accept (I will argue for this paper)) |
| Language: Is the paper written in correct English and style? | 2 (accept (I will argue for this paper)) |
| Format: Is the paper correctly and consistently formatted? | 2 (accept (I will argue for this paper)) |
| Review: | CONTRIBUTION OF THE PAPER: A method for machine translation using HMM in which information from both the source and target langauge is employed is presented. A heursitic to prune unlikely translations, and improve the HMM parameters using unttaged information from the source language in an unsupervised way is also presented. POSITIVE ASPECTS: Nice paper NEGATIVE ASPECTS: The paper is a bit dense CHANGES TO IMPROVE THE PAPER: For the audience and with the available space, it would be productive to highlight the intuitive idea FURTHER COMMENTS: ITEMS BELOW ARE JUSTIFICATION OF THE SCORES IF NEGATIVE: (1) IS THIS PAPER RELEVANT FOR THIS CONFERENCE? Yes (2) IS THIS PAPER TECHNICALLY SOUND AND COMPLETE? It seems it is, but I had no time to check in detail. (3) ARE THE CLAIMS SUFFICIENTLY SUPPORTED BY EXPERIMENTAL OR THEORETICAL RESULTS? they seem to be (4) ARE THE RESULTS/IDEAS INTERESTING FOR OTHER AI RESEARCHERS? Yes, in the area of machine translation (5) ARE THE RESULTS OR IDEAS NOVEL AND PREVIOUSLY UNKNOWN? It seems that the idea is novel (6) IS THE PAPER WELL-ORGANIZED AND EASY TO UNDERSTAND? Yes (7) IS THE PAPER WRITTEN IN CORRECT ENGLISH AND STYLE? Yes (8) IS THE PAPER CORRECTLY AND CONSISTENTLY FORMATTED? Yes |
| PC only: | |
| Time: | Jun 21, 03:25 |
Review 2
| PC member: | |
| Overall rating: | 3 (strong accept) |
| Confidence: | 2 |
| Relevance: Is this paper relevant for this conference? | 2 (accept (I will argue for this paper)) |
| Soundness: Is this paper technically sound and complete? | 1 (weak accept (vote accept but don't mind rejecting)) |
| Are the claims sufficiently supported by experimental/theoretical results? | 2 (accept (I will argue for this paper)) |
| Significance: Are the results/ideas interesting for other AI researchers? | 2 (accept (I will argue for this paper)) |
| Originality: Are the results or ideas novel and previously unknown? | 1 (weak accept (vote accept but don't mind rejecting)) |
| Readability: Is the paper well-organized and easy to understand? | 2 (accept (I will argue for this paper)) |
| Language: Is the paper written in correct English and style? | 2 (accept (I will argue for this paper)) |
| Format: Is the paper correctly and consistently formatted? | 2 (accept (I will argue for this paper)) |
| Review: | CONTRIBUTION OF THE PAPER: Authors present a method that uses a priori knowledge obtained in an unsupervised manner to prune unlikely disambiguations in each text segment. The goal is to reduce the number of translations to be performed during training. POSITIVE ASPECTS: The paper is, essetially, a practice one. It provides a well structured organization, easy to read even for non-expert users. A complete experimental section is also included. NEGATIVE ASPECTS: Authors do not explain if comparative tests take into account tools where disambiguation is not based on HMMs (apparently it is not the case). CHANGES TO IMPROVE THE PAPER: I think that the paper is well balanced. FURTHER COMMENTS: ITEMS BELOW ARE JUSTIFICATION OF THE SCORES IF NEGATIVE: (1) IS THIS PAPER RELEVANT FOR THIS CONFERENCE? (2) IS THIS PAPER TECHNICALLY SOUND AND COMPLETE? (3) ARE THE CLAIMS SUFFICIENTLY SUPPORTED BY EXPERIMENTAL OR THEORETICAL RESULTS? (4) ARE THE RESULTS/IDEAS INTERESTING FOR OTHER AI RESEARCHERS? (5) ARE THE RESULTS OR IDEAS NOVEL AND PREVIOUSLY UNKNOWN? (6) IS THE PAPER WELL-ORGANIZED AND EASY TO UNDERSTAND? (7) IS THE PAPER WRITTEN IN CORRECT ENGLISH AND STYLE? (8) IS THE PAPER CORRECTLY AND CONSISTENTLY FORMATTED? |
| PC only: | I think that the paper should be accepted. |
| Time: | Jul 3, 11:18 |
Review 3
| PC member: | |
| Reviewer: | |
| Overall rating: | 2 (accept: I will argue for this paper) |
| Confidence: | 2 |
| Relevance: Is this paper relevant for this conference? | 2 (accept (I will argue for this paper)) |
| Soundness: Is this paper technically sound and complete? | 2 (accept (I will argue for this paper)) |
| Are the claims sufficiently supported by experimental/theoretical results? | 2 (accept (I will argue for this paper)) |
| Significance: Are the results/ideas interesting for other AI researchers? | 2 (accept (I will argue for this paper)) |
| Originality: Are the results or ideas novel and previously unknown? | 2 (accept (I will argue for this paper)) |
| Readability: Is the paper well-organized and easy to understand? | 3 (strong accept) |
| Language: Is the paper written in correct English and style? | 2 (accept (I will argue for this paper)) |
| Format: Is the paper correctly and consistently formatted? | 2 (accept (I will argue for this paper)) |
| Review: | CONTRIBUTION OF THE PAPER: The authors propose a new and simple technics of pruning of disambiguation paths while training hidden-Markov-model-based PoS taggers using a priory knowledge. They consider the previously proposed method of unsupervised training PoS-taggers for sourse language in machine translation using a probabilistic model of the target language to estimate the maximal likehood of possible disambiguation paths. This method suffers from exponential explosion of the space of disambiguation paths. Their pruning method consists, for given a priori likehoods, in taking in account only those paths whose mass probability does not exceed a threshold. The a priori likehoods are periodically updated using the counts collected from the target language. The experiments for Spanish as the source language and Catalan as the target language have shown that this method substantially decreases the number of relevant disambiguation paths (and so the training time) even slightly amelioratiing the accuracy of tagging. POSITIVE ASPECTS: The proposed method is simple and efficient NEGATIVE ASPECTS: It is tested for only one pair of languages and a simplified weighting function for mixing the initial and the recounted HMM. CHANGES TO IMPROVE THE PAPER: FURTHER COMMENTS: ITEMS BELOW ARE JUSTIFICATION OF THE SCORES IF NEGATIVE: (1) IS THIS PAPER RELEVANT FOR THIS CONFERENCE? yes (2) IS THIS PAPER TECHNICALLY SOUND AND COMPLETE? yes (3) ARE THE CLAIMS SUFFICIENTLY SUPPORTED BY EXPERIMENTAL OR THEORETICAL RESULTS? yes, but see my remark above (4) ARE THE RESULTS/IDEAS INTERESTING FOR OTHER AI RESEARCHERS? I believe, yes (5) ARE THE RESULTS OR IDEAS NOVEL AND PREVIOUSLY UNKNOWN? yes (6) IS THE PAPER WELL-ORGANIZED AND EASY TO UNDERSTAND? yes (7) IS THE PAPER WRITTEN IN CORRECT ENGLISH AND STYLE? yes (8) IS THE PAPER CORRECTLY AND CONSISTENTLY FORMATTED? yes |
| PC only: | |
| Time: | Jul 14, 10:10 |