Tuesday, May 5, 2020
Anova for Computer Theory and Engineering - myassignmenthelp
Question: Discuss about theAnova for Computer Theory and Engineering. Answer: Bharathi, A., and A. M. Natarajan. "Cancer Classification of Bioinformatics datausing ANOVA."International journal of computer theory and engineering2, no. 3 (2010): 369. This paper concerned the field of bioengineering with an aim of classifying the cancer genes from a micro array data through the use of supervised machine learning programs. The overall goal of conducting the research was to identify the gene fold in order to reduce the noise from the irrelevant genes, simplify gene expression in order to reduce the number of genes available and also since it enables to identify a biological relationship between the number of genes and cancer development and treatment. The research paper therefore used the repeated measures ANOVA as a means to select the important genes. According to the article the research hypothesis was that cancer genes can be classified and therefore the two step ANOVA was meant to ascertain this claim. The two step ANOVA is an extension of the one way analysis of variance in which there are two independent variables. The main underlying assumptions of the two step ANOVA include that the sample is normally distributed, independe nt and equal variances. The size of the sample must also be the same size. The hypotheses of the study include that the population means of the first factor are equal. In this case it means that the population means of the cancer classified genes are equal. The next hypothesis is that the population means of the second factor are equal which means that the population means of the genes are evaluated with regard to the second classification variable and lastly, the final hypothesis determines that there is no interaction between the classifying factors where a test of independence is obtained and contingency tables are derived. In this research the genes in the training data set were ranked using a scoring scheme and then two gene combinations are tested with regard to their classification capability. In this case the classifier which was used was the support vector machines. The ANOVA was used as a means to classify the importance of each gene identified. It therefore means that the independent variable in this case was the gene while the dependent variable is the cancer present in the cells. The factors included the classification variables in which the variation in the response is classified with regard to the variation that is due to the differences between the classifications variables and that attributed to the random error with regard to the various combinations of the classified genes. The aim of the two way ANOVA is to determine the effect of a certain change in response with regard to two factors which could affect the dependent variable. In this case the dependent variable is the cancer present i n the cells. The two way ANOVA accounts for the degrees of freedom available for each factor less one for either factor. It also requires the evaluation for the F ratio which uncovers the mean square value for the basis of the variation of the residual mean square. In this case the F ratio will determine how the cancer cell is affected by the change in the combination of the affected gene and the portion of the cancer cell which is impacted by other factors other than the element of the classified gene. The article reveals that 42 samples of cells were obtained from a Diffuse Large B-cell Lymphoma (DLBCL), nine samples from Follicular Lymphoma (FL), and 11 samples from Chronic Lymphocytic Leukemia (CLL) where the overall data set included 4026 genes. Then the sample was equally divided into two parts for the test and training purposes and from the ANOVA, 20 genes were ranked high and were selected to form 2 gene combinations with 190 iterations from which the highest ANOVA was picke d. The study uncovered that the ANOVA method was effective in identifying gene subsets for accurate cancer classification through the use of the ranking system. It therefore concludes that Cancer related genes can be adequately identified through a ranking system and that there is independence between the classification variables as identified from the genes. The study makes good use of the ANOVA statistical tool in its research process.it however fails to adequately document certain key aspects of the study such as how the data was obtained as much as it gives the characteristics of the data used. It also fails to provide sufficient information concerning the factors under consideration in the classifying process in this case referred to as the classifying variables. The lack of sufficient information may limit the reliability of the research in order to assert the assertions put forth by the writers. it was also important for the research to provide an ANOVA table that documented thei r results as well as the F ratios which were of significance to the study. Overall, it demonstrated the efficacy of the ANOVA as a statistical tool that could be useful in the classification of cancer related genes.