Determination of writing styles to detect similarities in digital documents

Yohandri Ril Gil, Yuniet del Carmen Toll Palma, Eddy Fonseca Lahens

Abstract


Anything involving human intellect is at risk of being plagiarised. This includes scientific and literary works such as articles, theses, audiovisual works, plans, projects and computer programs. However, this article pays special attention to the existence of this phenomenon in written works in general, and in digital documents in natural or programming languages in particular. The objective of the research is to develop and apply a mathematical model that allows the writing style used in the drafting of texts to be determined. The results obtained from the application of the procedure are intended to serve as the basis for reducing the number of documents that need to be compared in order to analyse and detect similarities in them. The procedure was experimentally applied to a set of articles classified by topic and author, where the writing styles used to draft them differed.


Keywords


writing style; digital documents; plagiarism; procedure

References


Clough, P. (2000). Plagiarism in natural and programming languages: an overview of current tools and technologies. Research Memoranda: CS-00-05, Department of Computer Science, University of Sheffield, UK, 1-31. Retrieved from http://ir.shef.ac.uk/cloughie/papers/plagiarism2000.pdf

Cooper, J. W., Coden, A. R., & Brown, E. W. (2002). Detecting similar documents using salient terms. In Proceedings of the 11th international conference on Information and Knowledge Management. New York, NY: ACM. Retrieved from http://www.labsoftware.com/flahdo/Papers/CIKMDuplicates.pdf

Dale, E., & Chall, J. S. (1948). A formula for predicting readability. Educational Research Bulletin, 27(1), 11-20. Retrieved from http://www.ecy.wa.gov/quality/plaintalk/resources/classics.pdf

Dubay, W. H. (2004). The principles of readability. Costa Mesa, CA: Impact Information. Retrieved from http://files.eric.ed.gov/fulltext/ED490073.pdf

Gitchell, D., & Tran, N. (1999). Sim: a utility for detecting similarity in computer programs. In The proceedings of the 30th SIGCSE technical symposium on Computer Science Education. New York, NY: ACM. Retrieved from http://www.eng.uwi.tt/depts/elec/staff/feisal/ee302/sim-gitchell.pdf

Gruner, S. & Naven, S. (2005). Tool support for plagiarism detection in text documents. In Proceedings of the 2005 ACM symposium on Applied Computing. New York, NY: ACM. Retrieved from http://dl.acm.org/citation.cfm?id=1066677.1066854. doi http://dx.doi.org/10.1145/1066677.1066854

Honoré, A. (1979). Some simple measures of richness of vocabulary. Association for Literary and Linguistic Computing Bulletin, 7(2).

Plagiarise (n.d.). In The Collins English Dictionary. Retrieved from http://www.collinsdictionary.com/dictionary/english/plagiarise

Real Academia Española (Ed.) (2001). Diccionario de la Real Academia Española. Madrid, Spain: Real Academia Española.

Si, A., Leong, H. V., & Lau, R. W. H. (1997). Check: a document plagiarism detection system. In Proceedings of the 1997 ACM symposium on Applied Computing. New York, NY: ACM. Retrieved from http://www.cs.cityu.edu.hk/~rynson/papers/sac97.pdf. doi http://dx.doi.org/10.1145/331697.335176

Wikipedia (2011). Gunning fog index. Wikipedia. Online: Wikipedia.org. Retrieved from http://en.wikipedia.org/wiki/Gunning_fog_index

Yule, G. U. (1944).The statistical study of literary vocabulary. Journal of the Royal Statistical Society, 107(2), 129-131. Retrieved from http://www.jstor.org/discover/10.2307/2981280?uid=3737824&uid=2129&uid=2&uid=70&uid=4&sid=21102626763567. doi http://dx.doi.org/10.2307/2981280




DOI: http://dx.doi.org/10.7238/rusc.v11i1.1783

Refbacks

  • There are currently no refbacks.




 Universitat Oberta de Catalunya. eLearn Center

RUSC. Universities and Knowledge Society Journal is an e-journal edited by the Universitat Oberta de Catalunya (Barcelona).

Creative Commons
The texts published in this journal are – unless indicated otherwise – covered by the Creative Commons Spain Attribution 3.0 licence. You may copy, distribute, transmit and adapt the work, provided you attribute it (authorship, journal name, publisher) in the manner specified by the author(s) or licensor(s). The full text of the licence can be consulted here: http://creativecommons.org/licenses/by/3.0/es/deed.en.