About NetCollo

NetCollo, developed on the basis of Cowie and Howarth’s (1996) notions of “intercollocability,” is an online corpus-based collocation exploration tool which enables users to explore collocation networks in English. With this tool, English instructors or learners are able to: test whether a combination is possible in English (i.e. say truth), get suggestions for miscollocations (e.g. tell truth), learn other semantically-related correct combinations (e.g. state fact), and explore other semantically-related miscollocations which also need to be avoided (e.g. tell fact).

The current released version of NetCollo (NetCollo 2.0) establishes networks based on resources in three corpora: British National Corpus (BNC, http://www.natcorp.ox.ac.uk/), engineering corpus, and computer science corpus. The BNC, which contains 100 million word tokens coming from written and spoken texts by native speakers of British English, is used to provide “accurate” collocational knowledge for English L2 learners. The engineering and computer science corpora, with each comprising 10,325,800 and 12,002,335 running tokens, are developed based on top journals in the selected domains. There are more than 1,500 and 1,300 articles stored in the two technical corpora, respectively. The two corpora are expected to provide engineering or computer science majors with networks in which they are able to figure out and discover specialized usages in their chosen professions.

NetCollo utilizes a hybrid approach processing both shared collocates and semantic properties of searched words in corpora to form networks. As a word pair (word1 and word2) is keyed in, NetCollo automatically compile two lists of words which tend to co-occur with word1 or word2 as well as “share the most collocates” with them. The measures that NetCollo employs to determine co-occurrence strengths are raw frequency in corpora and mutual information (MI). The default values for them are set at 10 and 4.0, respectively, and can be changed by users for their own purposes. For non-frequent words, particularly, we suggest lower frequency threshold to 5 or 3. NetCollo provides two ways of ordering the two word lists: intercollocability and semantic similarity. The former is decided by the number(s) of shared collocates with word1 or word2 and the latter is determined by shard common synsets, hypernyms, or hyponyms in WordNet (https://wordnet.princeton.edu/). An example collocation network established by NetCollo is:

Search Results for the Word Pair “attain purpose”



The combination which shows no result (e.g. reach purpose) in a so-called standard corpus (i.e. BNC in this example) certainly is a non-collocation. As for the target pair “attain purpose”, although it is included in five sentences in the BNC, its frequency and MI are lower than the threshold values. Combinations like this are marked in the background color of grey and are suggested as non-collocations as well. Finally, the ones with white background colors (e.g. achieve purpose) are good alternatives to the searched word pair and are recommended to be used by users. Learners, furthermore, can easily notice that there are many other true collocations (e.g. attain goal; achieve aim; accomplish objective) and incorrect collocations in this single network. Users are also suggested to establish networks with our engineering or computer science corpus if they intend to explore domain-specific collocations and word usages.

NetCollo has been developed by Ping-Yu Huang, an assistant professor of the General Education Center at Ming Chi University of Technology and Nai-Lung Tsao, an assistant researcher at the Office of Information Services of Tamkang University. Those who intend to offer suggestions for NetCollo are suggested to contact Ping-Yu Huang via the email address: alanhuang25@hotmail.com.

Reference

Cowie, P. A. and Howarth, P. (1996). Phraseological competence and written proficiency. British Studies in Applied Linguistics, 11, 90-93.