A document image classification system fusing deep and machine learning models

dc.authoridSevim, Semih/0000-0002-2486-7704
dc.authoridEkinci, Ekin/0000-0003-0658-592X
dc.authoridSAYAR, AHMET/0000-0002-6335-459X
dc.contributor.authorOmurca, Sevinc Ilhan
dc.contributor.authorEkinci, Ekin
dc.contributor.authorSevim, Semih
dc.contributor.authorEdinc, Eren Berk
dc.contributor.authorEken, Suleyman
dc.contributor.authorSayar, Ahmet
dc.date.accessioned2025-07-03T21:26:52Z
dc.date.issued2023
dc.departmentBalıkesir Üniversitesi
dc.description.abstractArtificial Intelligence (AI) technologies are now widely employed to overcome human-induced faults in a variety of systems used in our daily lives, thanks to the digital transformation.One example of such systems is online document tracking systems (DTS). The DTS's reliability and preferability are enhanced by automatic document classification and understanding features. Although automatic document classification systems can assist humans in document understanding tasks, most of of them are not designed to function with Portable Document Format (PDF), which contains text, tables or figures. In this study, we investigate separate ways to efficiently classify student documents that are uploaded in PDF format and are required for university education. We propose three possible techniques for this issue. The first approach is based on Optical Character Recognition (OCR) and traditional machine learning methods. The second is purely on deep learning. The third one is based on fusion of deep learning methods based on entropy. The proposed techniques can classify twelve distinct types of digital documents. The validity of the proposed methods has been verified by student affairs department of Kocaeli University in Turkey. The system has not only increased the efficiency of online document uploading steps for students, but also reduced the human cost for tracking the documents. The highest F-score (94.45%) is obtained by the ensemble of EfficientNetB3 and ExtraTree.
dc.description.sponsorshipKocaeli University Scientific Researchand Development Support Program (BAP) in Turkey [FBA-2020-2152]
dc.description.sponsorshipThis work has been supported by the Kocaeli University Scientific Researchand Development Support Program (BAP) in Turkey under project number FBA-2020-2152.
dc.identifier.doi10.1007/s10489-022-04306-5
dc.identifier.endpage15310
dc.identifier.issn0924-669X
dc.identifier.issn1573-7497
dc.identifier.issue12
dc.identifier.scopusqualityQ2
dc.identifier.startpage15295
dc.identifier.urihttps://doi.org/10.1007/s10489-022-04306-5
dc.identifier.urihttps://hdl.handle.net/20.500.12462/21916
dc.identifier.volume53
dc.identifier.wosWOS:000884177300003
dc.identifier.wosqualityQ2
dc.indekslendigikaynakWeb of Science
dc.language.isoen
dc.publisherSpringer
dc.relation.ispartofApplied Intelligence
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_WOS_20250703
dc.subjectDocument image classification
dc.subjectDocument understanding
dc.subjectDeep learning
dc.subjectMachine learning
dc.subjectEnsemble learning
dc.titleA document image classification system fusing deep and machine learning models
dc.typeArticle

Dosyalar