See also DBLP pages
Publications until 2018
2018
Miguel Rios, Wilker Aziz, Khalil Sima'an:
Deep Generative Model for Joint Alignment and Word Representation. NAACL-HLT 2018: 1011-1023
Joost Bastings, Wilker Aziz, Ivan Titov, Khalil Sima'an.
Modeling Latent Sentence Structure in Neural Machine Translation
In Extended abstract at ACL's NMT workshop, 2018.
2017
Gideon Maillette de Buy Wenniger, Khalil Sima'an and Andy Way. 2017. "Elastic-substitution decoding for Hierarchical SMT: efficiency, richer search and double labels." MT Summit. pages 201--215. September 2017. [ Download paper ] [ Bibtex ] [ Presentation ] [ Code ]
Hoang Cuong and Khalil Sima'an. Induction of Latent Domains in Heterogeneous Corpora: A Case Study of Word Alignment. Machine Translation Journal 31(4): 225-249 (2017).
Hoang Cuong and Khalil Sima'an. A Survey of Domain Adaptation for Statistical Machine Translation. Machine Translation Journal 31(4): 187-224 (2017).
Joost Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, Khalil Sima’an (2017). Graph Convolutional Encoders for Syntax-aware Neural Machine Translation. EMNLP’17. Copenhagen, Denmark. [bib]
Miloš Stanojević and Khalil Sima'an. Alternative Objective Functions for Training MT Evaluation Metrics . ACL 2017. abstract bib pdf
2016
Joachim Daiber, Miloš Stanojević and Khalil Sima'an. Universal Reordering via Linguistic Typology. In Proceedings COLING 2016.
Miloš Stanojević and Khalil Sima'an. Hierarchical Permutation Complexity for Word Order Evaluation. In Proceedings COLING 2016.
Joachim Daiber, Miloš Stanojević, Wilker Aziz and Khalil Sima'an. Examining the Relationship between Preordering and Word Order Freedom in Machine Translation. In proceedings First Conference on Statistical Machine Translation (WMT 2016), Berlin, August 2016.
Sophie Arnoult and Khalil Sima'an. Factoring Adjunction in Phrase-based SMT. Proceedings of the second workshop on Deep Machine Translation. Lisbon, 2016.
Philip Schulz, Wilker Aziz and Khalil Sima'an. Word Alignment without NULL Words. Proceedings ACL 2016, Berlin, August 2016.
Lucia Specia, Stella Frank, Khalil Sima’an and Desmond Elliott. 2016. A Shared Task on Multimodal Machine Translation and Crosslingual Image Description. In Proceedings of the First Conference on Statistical Machine Translation (WMT). [Slides]
Desmond Elliott, Stella Frank, Khalil Sima’an, and Lucia Specia. Multi30K: Multilingual English-German Image Descriptions. 2016. In Proceedings of the 5th Workshop on Vision and Language (VL’16).
Hoang Cuong, Stella Frank and Khalil Sima'an. ILLC-UvA Adaptation System (Scorpio) at WMT'16 IT-DOMAIN Task. In proceedings First Conference on Statistical Machine Translation (WMT 2016), Berlin, August 2016.
Hoang Cuong, Khalil Sima'an and Ivan Titov. Adapting to All Domains at Once: Rewarding Domain Invariance in SMT. Transactions of the Association for Computational Linguistics (TACL) 2016.
Gideon Maillette de Buy Wenniger and Khalil Sima'an. Labeling Hierarchical Phrase-Based Models without Linguistic Resources. Machine Translation Journal. Online January 2016.
2015
Hoang Cuong and Khalil Sima'an. Latent Domain Word Alignment for Heterogeneous Corpora. Proceedings HLT-NAACL 2015: 398-408.
Miloš Stanojević and Khalil Sima'an. Reordering Grammar Induction. Proceedings of the Conference on Empirical Methods in NLP 2015, EMNLP 2015, Lisboa, Portugal.
Miloš Stanojević and Khalil Sima'an. Evaluating MT systems with BEER. The Prague Bulletin of Mathematical Linguistics No. 104, 2015.
Miloš Stanojević and Khalil Sima'an. BEER 1.1: ILLC UvA submission to metrics and tuning task. The WMT 2015 Metrics and tuning tasks proceedings.
Joachim Diaber and Khalil Sima’an. Machine Translation with Source-Predicted Target Morphology. Proceedings of MT Summit XV. 2015. Miami, USA.
Joachim Daiber and Khalil Sima’an. Delimiting Morphosyntactic Search Space with Source-Side Reordering Models. Proceedings of the first Deep Machine Translation Workshop. 2015. Prague, Czech Republic. [Slides]
Sophie Arnoult and Khalil Sima'an: Modelling the Adjunct/Argument Distinction in Hierarchical Phrase-Based SMT. Proceedings of the first Deep Machine Translation Workshop. 2015. Prague, Czech Republic.
Constantin Orasan, Alessandro Cattelan, Gloria Corpas Pastor, Josef van Genabith, Manuel Herranz, Juan José Arevalillo, Qun Liu, Khalil Sima'an and Lucia Specia. The EXPERT project: Advancing the state of the art in hybrid translation technologies. In proceedings Translating and the Computer 37, 2015.
2014
Gideon Maillette de Buy Wenniger and Khalil Sima'an. Bilingual Markov Labels for Hierarchical SMT. Proceedings Workshop on SSST'2014 @ EMNLP 2014.
Milos Stanojević and Khalil Sima'an. Fitting Sentence Level Translation Evaluation with Many Dense Features. Proceedings EMNLP 2014. Qatar.
Hoang Cuong and Khalil Sima'an. Latent Domain Phrase-Based Translation Models for Adaptation. Proceedings EMNLP 2014. Qatar.
Hoang Cuong and Khalil Sima'an. Latent Domain Translation Models in Mix-of-Domains Haystack. Proceedings COLING 2014. Dublin, Ireland.
Milos Stanojević and Khalil Sima'an. BEER: Better Evaluation by Ranking. Proceedings Workshop on Statistical Machine Translation (WMT) 2014. Software for Beer: available for download (no additives, pure ingredients and straight from the tap!)
Sophie Arnoult and Khalil Sima'an. Translation Equivalence of Adjuncts. Proceedings Workshop on SSST'2014 @ EMNLP 2014.
Milos Stanojevic and Khalil Sima'an. Evaluating Word Order Recursively over Permutation Forests. Proceedings Workshop on SSST'2014 @ EMNLP 2014. Software also included in Beer: available for download (no additives, pure ingredients and straight from the tap!)
Joost Bastings and Khalil Sima'an: All Fragments Count in Parser Evaluation. Proceedings LREC 2014: 78-82. Software available for download from FREVAL: https://github.com/bastings/freval
Gideon Maillette de Buy Wenniger and Khalil Sima'an. Visualization, Search and Analysis of Hierarchical Translation Equivalence in Machine Translation Data. The Prague Bulletin of Mathematical Linguistics. Number 101, pages 43-54. April 2014. Software available for download https://bitbucket.org/teamwildtreechase/hatparsing
2013
Gideon Maillette de Buy Wenniger and Khalil Sima'an. A Formal Characterization of Parsing Word Alignments by Synchronous Grammars with Empirical Evidence to the ITG Hypothesis. In proceedings of NAACL workshop on Syntax, Semantics and Structure in Statistical Translation (SSST), June 2013, Atlanta, USA.
Gideon Maillette de Buy Wenniger and Khalil Sima'an. Hierarchical Alignment Decomposition Labels for Hiero Grammar Rules. In proceedings of NAACL workshop on Syntax, Semantics and Structure in Statistical Translation (SSST), June 2013, Atlanta, USA.
Tejaswini Deoskar, Markos Mylonakis and Khalil Sima'an. Learning Structural Dependencies of Words in the Zipfian Tail. Journal of Logic and Computation, 2013.
Khalil Sima'an and Gideon Maillete de Buy Wenniger. Hierarchical Alignment Trees: A Recursive Factorization of Reordering in Word Alignments with Empirical Results. Technical report.
2012 (on sabbatical January-August)
Maxim Khalilov and Khalil Sima'an. Statistical Translation After Source Reordering: Oracles, Context-Aware Models and Empirical Analysis. Journal of Natural Language Engineering (JNLE), to appear 2012.
Sophie Arnoult and Khalil Sima'an. Adjunct Alignment in Translation Data with an Application to Phrase-Based Statistical Machine Translation. European Association for Machine Translation. Trento, Italy, May 2012.
2011
Tejaswini Deoskar, Markos Mylonakis and Khalil Sima'an. Learning Structural Dependencies of Words in the Zipfian Tail. International Conf. on Parsing Technologies (IWPT 2011), Dublin, Ireland.
Markos Mylonakis and Khalil Sima'an. Learning Hierarchical Translation Structure with Linguistic Annotations. In the Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL:HLT 2011).
Hany Hassan, Khalil Sima'an and Andy Way. Efficient Accurate Direct Translation Models: One Tree at a Time. Machine Translation Journal, Springer. 2011.
Maxim Khalilov and Khalil Sima'an. Context-Sensitive Syntactic Source-Reordering by Statistical Transduction. Proc. of the The 5th International Joint Conference on Natural Language Processing (IJCNLP'11), pages - to appear, Chiang Mai (Thailand), November 2011.
Maxim Khalilov and Khalil Sima'an. ILLC-UvA translation system for EMNLP-WMT 2011. Proc. of the EMNLP 2011 5th Workshop on Statistical Machine Translation (WMT'11), pages - to appear, Edinburg (UK), July 2011.
2010
Maxim Khalilov and Khalil Sima'an. ILLC-UvA machine translation system for the IWSLT 2010 evaluation. Proc. of the 7th Int. Workshop on Spoken Language Translation (IWSLT'10), Paris (France), December 2010.
Markos Mylonakis and Khalil Sima'an. Learning Probabilistic Synchronous CFGs for Phrase Translation Models. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning (CoNLL 2010), Uppsala, Sweden, July 2010 [pdf]
Gideon Maillette de Buy Wenniger, Maxim Khalilov and Khalil Sima'an. A Toolkit for Visualizing the Coherence of Tree-based Reordering with Word-Alignments. In The Prague Bulletin of Mathematical Linguistics, Charles University Prague, 2010.
Reut Tsarfaty and Khalil Sima'an. Modeling Morphosyntactic Agreement for Constituency-Based Parsing of Modern Hebrew. In: Proceedings of the first workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL) at NA-ACL. Los Angeles, CA, USA, June 6, 2010. [pdf]
Maxim Khalilov and Khalil Sima'an. A discriminative syntactic model for source permutation via tree transduction. Proc. of The Fourth Workshop on Syntax and Structure in Statistical Translation (SSST-4) at the 23rd International Conference on Computational Linguistics (COLING'10), pages, Beijing (China), August 2010.
Maxim Khalilov and Khalil Sima'an. Source reordering using MaxEnt classifiers and supertags. Proc. of the 14th Annual Conference of the European Association for Machine Translation (EAMT'10), pp. 292-299, St.Raphael (France), 2010.
2009
Reut Tsarfaty and Khalil Sima'an. Evaluating an Alternative to Head-Driven Approaches to Parsing a (Relatively) Free Word-Order Language. In Proceedings of the Conference on Empirical Methods in NLP (EMNLP'09), Singapore. [pdf]
Hany Hassan, Khalil Sima'an and Andy Way. A Syntactified Direct Translation Model with Linear-Time Decoding. In Proceedings of the Conference on Empirircal Methos in NLP (EMNLP'09), Singapore.[pdf]
Hany Hassan, Khalil Sima'an and Andy Way. Lexicalized Semi-Incremental Dependency Parsing. In proceedings Recent Advances in NLP (RANLP'09), Borovets, Bulgaria. [pdf]
Tejaswini Deoskar, Mats Rooth and Khalil Sima'an. Smoothing fine-grained PCFG Lexicons. Proceedings International Conference on Parsing Technologies, Oct 2009. [pdf]
2008
Khalil Sima'an and Markos Mylonakis. Better Statistical Estimation Can Benefit All Phrases in Phrase-Based Statistical Machine Translation. In Proceedings IEEE Workshop on Spoken Language Technology (SLT) 2008, Goa, India.
Hany Hassan, Khalil Sima'an and Andy Way. A Syntactic Language Model based on Incremental CCG Parsing. In Proceedings IEEE Workshop on Spoken Language Technology (SLT) 2008, Goa, India.
Markos Mylonakis and Khalil Sima'an. Phrase Translation Probabilities with ITG Priors and Smoothing as Learning Objective.In Proceedings Conf. on Empirical Methods in NLP (EMNLP'08), 2008.Barbara Plank and Khalil Sima'an. Parsing with Subdomain Instance Weighting from Raw Corpora. In proceedings Interspeech 2008, Australia, Sep. 2008.
Reut Tsarfaty and Khalil Sima'an. Relational Realizational Parsing. In proceedings COLING 2008, Manchester, UK, August 2008.
Hany Hassan, Khalil Sima'an and Andy Way. Syntactically Lexicalized Phrase-Based Statistical Translation. In IEEE Transactions on Audio, Speech and Language Processing, August 2008.
Barbara Plank and Khalil Sima'an. Subdomain Sensitive Statistical Parsing using Raw Corpora. In Proceedings sixth International conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco.
Roy Bar-Haim, Khalil Sima'an and Yoad Winter. Part-of-Speech Tagging of Modern Hebrew Text. Journal of Natural Language Engineering (J-NLE), 14(2):223-251, 2008.
2007
Markos Mylonakis, Khalil Sima'an and R. Hwa. Unsupervised Estimation for Noisy-Channel Models. In 24th Annual International Conference on Machine Learning (ICML 2007).
Hany Hassan, Khalil Sima'an and Andy Way. Supertagged Phrase-Based Statistical Machine Translation. In Proceedings of 45th Annual Meeting of the Association for Comp. Linguistics (ACL'07).
Reut Tsarfaty and Khalil Sima'an. Accurate Unlexicalized Parsing for Modern Hebrew. In Proceedings of Text, Speech and Dialog (TSD'07). Lecture Notes in Computer Science (LNCS). Pilsen, Czech Republic, September 2007.
Reut Tsarfaty and Khalil Sima'an. Three-Dimensional Parametrization for Parsing Morphologically Rich Languages. In Proceedings of the International Conference on Parsing Technologies (IWPT'07). Prague, Czech Republic, June 2007.
Saib Mansour, Khalil Sima'an and Yoad Winter. Smoothing a Lexicon-based POS tagger for Arabic and Hebrew. In proceedings of ACL 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources. Prague, Czech Republic, 2007. Presented also as extended abstract at Bar Ilan Symposium on Artificial Intelligence (BISFAI 2007),
Markos Mylonakis and Khalil Sima'an. Translation Lexicon Estimates from Non-Parallel Corpora Pairs. In Proceedings Belgian-Netherlands AI Conference (BNAIC), Utrecht, 2007. BNAIC'07 Best Paper Award.
Reut Tsarfaty and Khalil Sima'an. Dimensions of Parameterization for Modern Hebrew Statistical Parsing. Extended abstract at Bar Ilan Symposium on Artificial Intelligence (2007),
2006
Khalil Sima'an, Maarten de Rijke, Remko Scha and Rob van Son (eds.) Proceedings of 16th Computational Linguistics in the Netherlands (selected papers from CLIN 2005 meeting). Edited volume, December 2006.
Khalil Sima'an. Book review of Applied Combinators on Words (M. Lothaire). In Computational Linguistics (Vol 32, No. 3), Briefly Noted Section, 2006
Hany Hassan, Mary Hearne, Khalil Sima'an and Andy Way. Syntactic Phrase-based Statistical Machine Translation. Proceedings IEEE/ACL first International Workshop on Spoken Language Technology (SLT), December 2006, Aruba.
Rebecca Hwa, Carol Nichols and Khalil Sima'an. Corpus Variations for Translation Lexicon Induction. In proceedings of the Association for Machine Translation in the Americas (AMTA 2006).
D. Prescher, Remko Scha, Khalil Sima'an and Andreas Zollmann. What are Treebank Grammars? In proceedings of the Belgian-Netherlands Artificial Intelligence Conference (BNAIC), 2006, Namur, Belguim, 2006.
2005 and before
Andreas Zollmann and Khalil Sima'an. A Consistent and Efficient Estimator for Data-Oriented Parsing. Journal of Automata, Languages and Combinatorics (JALC), Vol. 10 (2005) Number 2/3, pages 367-388. In short: As far as I know, first proof of consistency (in the limit) of an estimator for models based on probabilistic grammars. Based on (Sima'an & Buratto 2003) the estimation by smoothing approach for DOP is enhanced here by held-out estimation (and leave-one-out), which leads to efficiency gains. For simplicity and efficiency reasons, the DOP* estimator is restricted to shortest-derivations instead of EM-training.
Roy Bar-Haim, Khalil Sima'an and Yoad Winter. Choosing an Optimal Architecture for Segmentation and POS-Tagging of Modern Hebrew. In proceedings of ACL 2005 Workshop on Computational Approaches to Semitic Languages. ps pdf abstract MorphTagger
D. Prescher, Remko Scha, Khalil Sima'an, Andreas Zollmann. On the Statistical Consistency of DOP Estimators. In Proceedings of the 14th Meeting of Computational Linguistics in the Netherlands (CLIN), 15 pages. Antwerp, Belgium. In short: We prove that unbiased statistical estimators for all-fragments DOP models necessarily lead to overfitting. We also prove that all-fragment DOP models can capture any distibution over sentence-parse pairs (which PCFGs cannot do). Finally we show that all-fragments DOP must be treated as a nonparametric learning model with consistent estimators to be found in smoothing techniques, just like other nonparametric and memory-based methods. This paper provides the theoretical and mathematical foundation for the previous papers [Sima'an and Buratto 2003; Hearne and Sima'an 2003].
Mary Hearne and Khalil Sima'an. Structured Parameter Estimation for LFG-DOP (pre-publication version). In Recent Advances in Natural Language Processing III. N. Nicolov, K. Bontcheva, G. Angelova and R. Mitkov (eds). Current Issues in Linguistic Theory 260. John Benjamins Publishing Company.
Khalil Sima'an. Robust Data-Oriented Understanding of Spoken Utterances. In H. Bunt, J. Carroll and G. Satta (eds.), New Developments in Parsing Technologies, pages 323-338, Kluwer (2004). (In short: Statistical parsing of word-lattices output by a speech-recognizer under a DOP-like language model enriched with Lambda-calculus style update semantics).
O. Tsur, M. de Rijke, and Khalil Sima'an. BioGrapher: Biography Questions as a Restricted Domain Question Answering Task. In Proceedings of the ACL 2004 Workshop on Question Answering in Restricted Domains, 2004.
Rens Bod, Remko Scha and Khalil Sima'an (eds.). Data-Oriented Parsing (edited volume) (410 pp.) Studies in Computational Linguistics, CSLI Publications, University of Chicago Press, 2003. Contributions by Rens Bod, Remko Bonnema, John Carroll, Jean-C�dric Chappelier, David Chiang, Ido Dagan, Guy De Pauw, Joshua Goodman, Lars Hoogweg, Aravind Joshi, Ronald Kaplan, Yuval Krymolowski, G�nter Neumann, Arjen Poutsma, Martin Rajman, Anoop Sarkar, Remko Scha, Khalil Sima'an, Srinivas Bangalore, Andy Way, David Weir and Menno van Zaanen.
Rens Bod, Remko Scha and Khalil Sima'an: "Introduction." In: Rens Bod, Remko Scha and Khalil Sima'an (eds.): Data-Oriented Parsing. Stanford: CSLI Publications, 2003, pp. 1-9.
Khalil Sima'an and L. Buratto. Backoff Parameter Estimation for the DOP Model In N. Lavrac, D. Gamberger, H. Blockeel and L. Todorovski (ed.). Proceedings of the European Conference on Machine Learning (ECML'03), Lecture Notes in Artificial Intelligence (LNAI 2837), pages 373-384, Springer, 2003. In short: The first consistent estimators for the DOP model are presented in this paper. It is shown how the DOP model parameters can be structured into a connected, directed acyclic graph that allows parameter estimation by smoothing (just like K-NN approaches) using backoff or interpolation. Insights that can be transferred for the estimation of other models in NLP and possibly other areas.
M. Hearne and Khalil Sima'an. Structured Parameter Estimation for LFG-DOP by Backoff In Proceedings of International Conference on Recent Advances in Natural Language Processing (RANLP'03), Bulgaria, 2003.
L. Buratto and Khalil Sima'an. Backoff DOP: Parameter Estimation by Backoff In V. Matousek and P. Mautner (eds.). Proceedings of the International Conference on Text, Speech and Dialogue (TSD'03), Lecture Notes in Artificial Intelligence (LNAI 2807), Springer, 2003.(8 pages)
Khalil Sima'an. On Maximizing Metrics for Syntactic Disambiguation In Proceedings of the International Workshop on Parsing Technologies (IWPT'03).Nancy, France, April2003 In short: In this paper I develop the ``Maximizing Metrics" (MM) disambiguation idea(Joshua Goodman -- phd thesis) to linguistically more adequate metrics and algorithms showing that stricter optimizing metrics could provide better results than optimizing a weak evaluation metric. The MM method is also known in speech-recognition with the names ``Maximum-Expected Recall" (Goodman), ``Minimum Risk Decoding" (Goel and Byrne), ``Word Error Minimization" (Stolcke et al 1997). One of the algorithms described in this paper is referred to with the "Max-Rule" algorihm in recent work (e.g., Matsuzaki et al 2005; Petrov et al 2008).
Khalil Sima'an. Empirical validity and technological viability: Probabilistic models of Natural Language Processing. In R. Bernardi and M. Moortgat (eds.), Linguistic Corpora and Logic Based Grammar Formalisms, CoLogNET Area 6, 2003
Khalil Sima'an. Computational Complexity of Probabilistic Disambiguation NP-Completeness results for Parsing Problems that arise in speech and language Processing Applications . In the journal Grammars vol. 5 (2), Kluwer Publishers, 2002. The paper provides proofs of NP-Completeness for the probabilistic disambiguation problems (1) Selecting the most probable sentence/path in a lattice/word-graph/SFSA (Stochastic Finite State Automaton) using a parser based on PCFG/SCFG or STSG/PTSG and (2) Selecting the most probable parse for an input sentence or lattice using a parser based on STSG. The results hold for weighted versions of these grammars also.
G. Musillo and Khalil Sima'an. Towards Comparing Parsers from Different Linguistic Frameworks: An Information Theoretic Approach. Proceedings of Beyond PARSEVAL: Towards Improved Evaluation Measures for Parsing Systems, LREC'02, Las Palmas, Gran Canaria, Spain, 2002.
G. Infante-Lopez, M. de Rijke and Khalil Sima'an. A General Probabilistic Model for Dependency Parsing. In proceedings of the BNAIC 2002, Leuven, Belgium.
M. Dastani and Khalil Sima'an. A Machine Learning Approach to Visual Perception. In European Society for the Study of Cognitive Systems (ESSCS 2002), Workshop on Multidisciplinary Aspects of Learning, Paris.
W. Daelemans, Khalil Sima'an, J. Veenstra and J. Zavrel (editors). Computational Linguistics in the Netherlands 2000 (CLIN'00) Language and Computers - Studies in Practical Linguistics 37 Rodopi publications, November 2001.
Khalil Sima'an, A. Itai, Y. Winter, A. Altman and N. Nativ. Building a Tree-Bank of Modern Hebrew Text. In Beatrice Daille and Laurent Romary (eds.), Journal Traitement Automatique des Langues (t.a.l.) , 2001. Special Issue on Natural Language Processing and Corpus Linguistics. Appeared also in Chinese translation in the book Window to the Computational Linguistics (ISBN 7-81085-140-3/N.48)(Here is the front page of the translated article)
Khalil Sima'an. Robust Data-Oriented Parsing for Speech-Understanding. Proceedings of the International Workshop on Parsing Technologies (IWPT'01). Beijing, China, October 2001.This paper describes treebank-based syntactic+semantic parsing of word lattices/graphs output by a speech-recognizer in order to obtain update semantics for the user utterances in a spoken dialogue system over the telephone. It present robustness techniques for data-oriented parsing.
Khalil Sima'an. Enhancing the Robustness of Data-Oriented Parsing for Speech-Understanding. Proceedings of the Natural Language Processing Pacific Rim Symposium (NLPRS'01). Tokyo, Japan, November 2001.
Khalil Sima'an. Tree-gram Parsing: Lexical Dependencies and Structural Relations Proceedings of 38th Annual Meeting of the Association for Computational Linguistics (ACL'00) , Hong Kong, China, 2000.
Khalil Sima'an. Efficient Parsing of Domain Language. Proceedings of the Belgian-Dutch Artificial Intelligence Conference (BNAIC'00), Efteling, The Netherlands, 2000. BNAIC'00 Best Paper Award.
Remko Scha, Rens Bod and Khalil Sima'an. A Memory-Based Modelof Syntactic Analysis: Data-Oriented Parsing. In special Issue on Memory-Based Processing, W. Daelemans (ed.), Journal of Empirical and Theoretical Artificial Intelligence (JETAI), 11 (3), 1999.Gert Veldhuijzen van Zanten, Gosse Bouma, Khalil Sima'an, Gertjan van Noord and Remko Bonnema. Evaluation of the NLP Components of the OVIS2 Spoken Dialogue System. In Computational Linguistics in the Netherlands 1998 (CLIN), F. van Eynde, I. Schuurman and N. Schelkens (editors), 1999.
Khalil Sima'an. Efficient Disambiguation by means of Stochastic Tree Substitution GrammarsIn New Methods in Language Processing . D. Jones and H. Somers (editors), UCL Press, UK, 1997.
Khalil Sima'an. An Optimized Algorithm for Data-Oriented Parsing In R. Mitkov and N. Nicolov (editors), Recent Advances in Natural Language Processing, Vol.136 of Current Issues in Linguistic Theory, John Benjamins, Amsterdam, 1996.
Khalil Sima'an. Explanation-Based Learning of Data-Oriented Parsing. In Proceedings of the Conference on Computational Natural Language Learning (CoNLL) at ACL/EACL-97, T. Mark Ellison (editor), Madrid, Spain, July 1997.
Khalil Sima'an. Explanation-Based Learning of Partial-Parsing In W. Daelemans., A. Van den Bosch and A. Weijters (editors). Workshop Notes of the ECML / MLnet Workshop on Empirical Learning of Natural Language Processing Tasks.Prague, Czech Republic. April 1997
Khalil Sima'an. Computational Complexity of Probabilistic Disambiguation by means of Tree Grammars. In Proceedings of the International Conference on Computational Linguistics (COLING '96), pp.1175-1180 (vol. 2), Copenhagen, Denmark, August 1996.
Khalil Sima'an. An Optimized Algorithm for Data-Oriented Parsing. In proceedings of International Conference on Recent Advances in Natural Language Processing (RANLP'95). Tzigov Chark. Bulgaria, 1995.
Khalil Sima'an, Rens Bod, S. Krauwer and Remko Scha. Efficient Disambiguation by means of Stochastic Tree Substitution Grammars. In Proceedings of the International Conference on New Methods in Language Processing (NeMLaP). Centre for Computational Linguistics, UMIST, Manchester, UK, pp.50-58, 1994.
Ph.D. Thesis
Khalil Sima'an. Learning Efficient Disambiguation ILLC dissertation series 1999-02. March 1999. ABSTRAT Amsterdam/Utrecht. European Foundation for Logic, Language and Information (FoLLI), year 2000 Beth Dissertation Award
Popularizing articles about language technology
Khalil Sima'an. ``Zes Weken JHU Summer Language Engineering Workshop (Baltimore 11 Juli - 20 Augustus, 2005)". In Dutch. DIXIT 2005. (A Dutch language and speech technology magazine).
Khalil Sima'an. ``Een half lege glas wijn..." (in Dutch). DIXIT (A Dutch language and speech technology magazine), issue 2, 2004.
Khalil Sima'an. Feeling the mood: On Machine Learning for Natural Language Processing. A column in ILLC Annual Report 2004, Institute for Logic, Language and Computation (ILLC), 2004.
Technical reports/Tutorials:
JHU Summer workshop team. Parsing Arabic Dialects. Final report (version I), January 2006. CSLP, JHU, Baltimore, USA.
JHU Summer workshop team. Closing day presentation. August 2005. CSLP, JHU, Baltimore, USA.
Khalil Sima'an. Probabilistic Models for NLP. Course slides for ESSLLI Foundational course with same title (176 slides in total)
Khalil Sima'an. Probabilistic Parsing. Course slides for ESSLLI Advanced course with same title (45 slides)
Khalil Sima'an. A Short Introduction to the DOP Model Part of material for ESSLLI Advanced course on Probabilistic Parsing
Rens Bod, R. Kaplan, Remko Scha, and Khalil Sima'an.A Data-Oriented Approach to Lexical-Functional Grammar. Computational Linguistics in the Netherlands 1996 (CLIN), Eindhoven, The Netherlands, 1997. (See also ``A probabilistic approach to LFG." Ftp://ftp-lfg.stanford.edu/pub/lfg/lfg-presentations/LFG96/kaplan-doptalk.ps. Slides of the keynote lecture by Ronald Kaplan held at LFG-workshop, Grenoble, France).
Khalil Sima'an. Learning Efficient Parsing, with application to DOP and Speech Understanding (Draft version). Report #35, Probabilistic Natural Language Processing, NWO's Priority Programme on Language and Speech Technology, Amsterdam/Utrecht, January 1997.
Khalil Sima'an (University of Utrecht) and Remko Scha, R. Bonnema, Rens Bod (University of Amsterdam). Disambiguation and Interpretation of Wordgraphs using Data-Oriented Parsing. Report #31, Probabilistic Natural Language Processing, NWO priority Programme for Language and SpeechFaculty of Arts, University Utrecht. 1996 Technology, Amsterdam, November 1996.
Rens Bod, S. Krauwer and Khalil Sima'an. Combining Linguistic and Statistical Knowledge CLASK: Final report. Institute for Language and Speech, Faculty of Arts, University Utrecht. 1996.
Khalil Sima'an. Design Principles for Real-Time Process Control Systems Technical Report 94-42: Delft University of Technology, 1994.