Protein Sequence Classification using Natural Language Processing
Aditya Shinde,  Mitchell D Silva
Protein is an important component of every cell in the body. It is an important building block of bones, muscles, cartilage, skin, and blood. For developing competitive pharmacological products, classifying a protein sequence precisely from a large biological protein sequences dataset plays a significant role. Comparing the unseen or novel sequence with all the identified protein sequences and accurately predicting the category of protein requires efforts and are usually time consuming. Therefore, in order to improve the efficiency of protein classification, a protein sequence classification system is developed using machine learning and data mining techniques. In protein analysis, sequence alignment, sequence searching and sequence classification can be done using sequence mining techniques. Protein sequence classification has also become a field of interest for many scientists. It has the potential for discovering the recurring structures that exist in the protein sequences and precisely classify those sequences. This paper provides a novel approach for protein sequence classification using Natural Language Processing.
Keywords- Word Embedding, Embedding Layer, Classification report, Confusion Matrix, Neural Networks.
Cite this Article
Aditya Shinde,  Mitchell D Silva,   "Protein Sequence Classification using Natural Language Processing"
, International Journal of Engineering Development and Research (IJEDR), ISSN:2321-9939, Volume.7, Issue 1, pp.169-175, February 2019, Available at :http://www.ijedr.org/papers/IJEDR1901032.pdf