Combining Statistical Translation Techniques for Cross-Language Information Retrieval

分享此页面到:

Cross-language information retrieval today is dominated by techniques that rely principally on context-independent token-to-token mappings despite the fact that state-of-the-art statistical machine translation systems now have far richer translation models available in their internal representations. This paper explores combination-of-evidence techniques using three types of statistical translation models: context-independent token translation, token translation using phrase-dependent contexts, and token translation using sentence-dependent contexts. Context-independent translation is performed using statistically-aligned tokens in parallel text, phrase-dependent translation is performed using aligned statistical phrases, and sentence-dependent translation is performed using those same aligned phrases together with an n-gram language model. Experiments on retrieval of Arabic, Chinese, and French documents
using English queries show that no one technique is optimal for all queries, but that statistically significant improvements in mean average precision over strong baselines can be achieved by combining translation evidence from all three techniques. The optimal combination is, however, found to be resource-dependent, indicating a need for future work on robust tuning to the characteristics of individual collections.

Please see the attachment for more details.

附件下载

参与评论

共有 6 条评论

蒋浩

尽管目前最先进的统计机器翻译系统在其内部表示中有更丰富的翻译模型，但跨语言信息检索目前主要依靠上下文无关的标记到标记映射的技术本文利用三种统计翻译模型探讨了证据技术的结合：上下文无关的标记翻译、短语相关的标记翻译和句子相关的标记翻译在平行文本中使用统计对齐的标记执行上下文无关的翻译，使用对齐的统计短语执行短语相关的翻译，并且使用这些相同的对齐短语和n-gram语言模型执行句子相关的翻译阿拉伯文、中文和法文文献检索实验使用英语查询表明，没有一种技术对所有查询都是最优的，但是通过将这三种技术的翻译证据结合起来，可以在统计上显著提高强基线下的平均精度然而，发现最佳组合依赖于资源，这表明未来需要对单个集合的特性进行稳健的调整。详情请参阅附件。

王佳辉

杨浩宇

When disposed of properly, our bottles can be recycled into new bottles over and over again. The current consultations on packaging and recycling represent a once-in-a-generation opportunity to reform the system to ensure more packaging is recovered and recycled and we welcome these

敬爽

阿西吧

张政

和姑姑陪哦T1提而你虎

张政

加强跨文化交际，以小组为单位翻译论文，通过参与互动式小组翻译，提高大学生的跨文化交际能力和理解力。由澳大利亚-中国理事资助的项目一个选定的讲中文的翻译学生学习小组参加了该项目，从2011的8月到2012年5月由原住民研究出版社出版。为期一年的翻译项目完成后，作者对参加者的翻译经历进行了调查和录音采访。运用社会建构主义理论，对数据进行编码，对内容进行批判性分析，并对主题进行分类。结果发现，参加者不仅通过理论与实践的结合提高了翻译技巧，而且对澳大利亚原住民的文化传统和历史也比以前有了更好的了解。在理解了跨语言差异之后，他们把翻译理论与实践结合起来，通过以翻译项目为中心的各种有组织的学习活动，提高了他们的跨文化意识。这种基于交互的学生参与学习方法通过个人反思、小组讨论和研讨会帮助学生翻译者实现有意义的交流和学习者自主。最后，对团队翻译项目的教学含义进行了讨论，请参阅附件以了解更多细节。

1 2 3 4

最新推荐 MORE>>