Skip to main content
SearchLoginLogin or Signup

CVD Mode: Hybrid AI-Adaptive Framework for Enhancing Digital Accessibility for Color Vision Deficiency

Published onNov 26, 2024
CVD Mode: Hybrid AI-Adaptive Framework for Enhancing Digital Accessibility for Color Vision Deficiency
·

Author Note: Correspondence concerning this article should be addressed to [email protected]

Abstract

This paper proposes a mode called CVD Mode, similar to light and dark modes of display, to address some challenges faced by approximately 300 million people globally with Colour Vision Deficiency (CVD) in online educational environments. CVD users often struggle to interpret colour dependent visuals, such as graphs, videos, and interactive content, which leads to misinterpretations, reduced engagement, and unequal learning outcomes. Current solutions frequently lack of adaptability and precision required to accommodate diverse CVD types and variations in spectral sensitivities, resulting in limited effectiveness. The proposed CVD mode introduces a hybrid AI adaptive framework that enhances accessibility by examining user behaviour with colours by conducting interactive random online diagnosis tests, results data from the system can identify type and severity of CVD, enabling accurate, user-specific colour adjustments. through personalized simulations for educators and designers, as well as recoloring of websites, videos, and images. With both manual and AI adaptive modes, the framework offers flexibility, allowing users to fine-tune their visual experiences while maintaining a natural appearance. This paper delves into related work, system architecture, methodologies, future focussed design will impact of scalable, adaptive solutions designed to improve accessibility and engagement for CVD users in digital learning contexts.


Human colour vision depends on three types of cones: L-cones (sensitive to red light at 560 nm), M-cones (green light at 530 nm), and S-cones (blue light at 420 nm). If one cone is missing (Dichromacy), it disrupts colour perception, causing confusion, while reduced pigment in cones (Anomalous Trichromacy) leads to partial blindness. The severity of colour vision deficiency ranges from mild to complete blindness. Red-green colour blindness is most common, and some deficiencies affect blue-yellow and perception, some perceive only grey scale.

Children with colour vision deficiency (CVD) often hide their challenges due to a lack of awareness or social pressures, with many adults up to age 20 unaware of their condition, assuming their colour perception is normal. This unawareness can lead to educational difficulties, especially when online platforms rely heavily on colour cues. To address this, proposing CVD Mode similar to dark and light modes for displays, offers a colour blind friendly solution by applying daltonization to enhance colour perception in real-time videos, webpages and images without the need for external lenses. An AI driven approach conducts online diagnostic tests to users and understands how the users perceive colours, adapting content accordingly. The simulator, using the LMS colour space, adjusts images and videos, webpages based on these results, providing a tailored solution for colour blind users. Additionally, the system supports video downloading with embedded daltonization, enabling users to access and retain educational content in a more accessible format. The framework offers modular operations for daltonization, simulation, and diagnosis, with features such as AI mode, manual mode, tuning mode, faster video processing, and low-power device support.

Figure 1: Taxonomy of types of colour vision deficiency

Previous studies on CVD solutions emphasize the use of Kotera algorithms for both daltonization and simulation (Kotera, 2015), utilizing the RGB to LMS method with linear transformations for images and videos, preserving their natural appearance using grey scale. Li et al proposed a self-adapting rule-based approach for image recoloring using the pix2pix conditional Generative Adversarial Networks (GAN) model and developed specific datasets using IOQM for this purpose (Li et al, 2020). These datasets were evaluated with their custom-designed tool, and they found that pix2pix GAN was more efficient than other GAN types for recoloring tasks. Daltonize.org offers a Chrome extension that applies SVG and CSS filters to recolor web pages, enhancing accessibility for users with colour vision deficiencies

Our adaptation utilizes Kotera’s simulation algorithm (Kotera, 2015) for both AI and manual modes due to the effectiveness of RGB-to-LMS simulation, which closely aligns with human perception, allowing for tunings that preserve a natural appearance. Daltonization is applied exclusively in manual mode, achieving high accuracy through the use of pix2pix conditional GAN integrated with AI and optimized by Huang's method (Huang et al., 2009). This results in a self-adaptive system capable of accommodating a wide range of spectral changes across different types of CVD. Datasets are generated using the Improved Octree Quantization Method (IOQM) and evaluated with a dedicated tool. In contrast, rule-based methods restricts AI explore narrower hues and spectral shifts. Additionally, using SVG/CSS as predetermined filters for manual recoloring and simulation modes for videos, images and websites with grey scale manual tuning obtains natural appeareance, secures the system remains lightweight and efficient. Dignosis tests such as the Farnsworth D15 (Cranwell et al., 2015), the Ishihara test (Daltonien), and the Farnsworth 100 Hue test (Bassi, 1993) specialize in detecting various parameters, including CVD types, severity levels, and spectral changes required for AI .

Materials and Methods

System Architecture Overview

The system architecture illustration outlines the overall design of the CVD mode within the platform, showcasing different modes, user interactions, and the flow of activities from user entry to daltonized content. It demonstrates how various components collaborate to execute operations, how AI supports modules based on user actions, and the roles of front-end (client-side) and back-end (server-side)elements.

Figure 2: Illustrates a comprehensive roadmap detailing the sequential steps involved in the system’s operation.

Modular Framework

The modular framework adopts a hybrid approach prioritizing scalability and modularity through flexible, AI-driven reusable templates. Components are AI-integrated to accommodate the needs of color vision deficiency (CVD) users. The CVD mode includes three core modules such as module 1; a color vision diagnosis test and report generation (manual mode only), module 2; a CVD Simulator, and module 3; CVD daltonization (recoloring). Modules 2 and 3 offer both manual and AI adaptive modes. Module 1’s diagnostic data enhances AI adaptive features in subsequent modules. In manual mode, users manually choose their simulation and daltonization filters. Collectively, these modules enhance the accessibility and adaptability of the CVD mode framework.

Figure 3: Illustrates the modular breakdown of CVD mode comprises three modules communicates with AI.

Hybrid Approach

Hybrid system integrates front-end technologies like HTML, CSS, SVG, and JavaScript with back-end tools such as Python, OpenCV, NUmpy, PyTorch, CUDA, Pix2Pix GAN, and FFMPEG. This setup offers a responsive interface for CVD-friendly content while enabling complex daltonization and simulations on the back-end. The approach supports low-power devices by balancing workloads between manual and AI adaptive modes. In manual mode, SVG/CSS lightweight filters use predefined matrices to recolor videos, images, and webpages smoothly, making it ideal for low-power devices. In AI adaptive mode, graphic processing unit (GPU) processing is required for high-quality, real-time video handling. Since many users lack GPUs, server-side GPU support assures smooth, accurate colour transformations and a seamless user experience.

Python has been chosen for backend operations because of its powerful libraries for data processing, AI, and machine learning, which simplify complex computations such as daltonization (module 3) using GAN. While Python is highly effective for backend tasks, it is less optimized for creating dynamic user interfaces. In contrast, web technologies like HTML, CSS, and JavaScript provide superior performance for front-end operations, offering lightweight and efficient processing directly in the client's browser.

Components involved in Hybrid approach

Table 1: This table shows the components involved in front-end and back-end operates the hybrid approach.

Technology

Description

Usage Context

PyTorch Framework

(back-end)

Integrates and runs AI models to adapt and personalized daltonization, simulations, and other effects based on user input and content analysis.

AI model integration

Python

(back-end)

Serves as the primary backend language for video and image processing, server-side logic management, and integration with tools like openCV and AI models.

Primary backend language

Compute unified device architecture (CUDA)

(back-end)

Provides GPU acceleration, allowing faster processing of high quality videos, complex transformations, and object segmentation effects.

GPU acceleration

Fast Forward Moving Picture Experts Group (FFmpeg)

(back-end)

A multimedia framework enabling video playback and downloading by encoding both predetermined and custom filters alongside openCV in videos, images, and webpages.

Multimedia processing

Convolution Neural Network (CNN)

Analyzes and learns from paired datasets of images and videos, identifying principle features and patterns in how CVD individuals perceive and differentiate colors.

Machine learning analysis

OpenCV

(back-end)

Offers extensive tools for image and video processing, including daltonization filters, object detection, and region-based transformations.

Image and video processing

NumPy

(back-end)

Provides efficient array operations, matrix transformations, and numerical computations essential for data manipulation and mathematical calculations.

Array manipulation

Scalable Vector Graphics (SVG)

(front-end)

Enables high-quality daltonized and simulated videos, images, and webpages using pre-defined filters without processing each frame, providing faster results and custom. settings without a GPU.

Client-side rendering

Cascading Style Sheets (CSS)

(front-end)

Creating filters for visual effects in web content, compared to more complex rendering or video-processing tools and operates directly in the browser, making it efficient for applying visual changes like colour adjustments

Client-side rendering

Convolutional Neural Network

(back-end)

CNN analyzes and learns from paired datasets of images and videos, identifying key features and patterns in how. CVD individuals perceive and differentiate colours.

Machine learning analysis

Genarative adversarial networks (GAN)

(back-end)

Used for generating variations of data, enhancing simulations, and testing and validating datasets to improve model robustness and data quality.

Creating, testing and validation of datasets

HTML

(front-end)

Defines the structure and layout of web content, providing the basic building blocks for web pages and user interfaces.

Web page structure

JavaScript

(front-end and back-end)

Provides an interactive user interface facilitating continuous communication between the front and backend using WebSockets.

Front-End and Back-End Communication

Detection and Assessment Techniques

CVD diagnosis tests are crucial for determining the kind and severity of CVD inefficiencies in colour discriminations, such as trichromacy, dichromacy and monochromacy, which affects how individuals perceive colours. With this diagnostic data (results), AI can accurately predict CVD individual user needs and provide visual output affirms accessibility to meet WCAG (Web Content Accessibility Guidelines) compliance on educational platforms.

Selection Standards for Diagnosis Tests

Conducting online diagnosis tests presents challenges due to psychological factors such as user mood, interactive behaviors, and potential memory manipulations, especially among children who may hide their colour perception difficulties. Ishihara, farnsworth 100 hue test (F100Hue), and farnsworth D15 (FD15) tests align with established standards for assessing CVDs. While F100Hue is highly accurate and detects all types of CVD, but more time consuming. In contrast, D15 a subset of the F100Hue , offers a faster alternative with reliable accuracy, providing critical data for AI to better understand individual needs and user-specific parameters

Figure 4: Illustrates the phycological standard criteria required for conducting interactive online diagnosis test.

Colour Vision Detection and Report Generation

This module conducts the Ishihara, D15, and F100Hue tests through AI, helps to detect severity and type of the CVD, as well as the spectral sensitivities and all the process. Data reveal to users is optional.

Figure 5: Illustrates the flow chart of diagnosis test sequencial operations start with user entry.

Simulation Algorithm in both Manual Mode and AI Adaptive Mode

Simulation approaches simulate CVD perceptions to help educators and developers better comprehend and access them. Kotera's approach can handle both trichromacy and dichromacy, allowing for exact modifications based on diagnostic data. Simulations in both manual and AI modes are based on Kotera's algorithm, which runs within the LMS colour space and is closely related to human colour perception. This approach uses linear transformations, which reduces processing requirements. Using data from diagnostic testing, the simulator precisely represents each user's perception while preserving a realistic colour appearance. This helps educators and designers understand how consumers perceive colours at various spectrum sensitivities, allowing them to create websites, photos, and videos with higher standards and accessibility.

Simulation Objective Function for both Manual and AI Adaptive modes

CCVD∗​=RCVD​⋅CLMS∗​

Daltonization (recoloring) Algorithm in Manual and AI Adaptive Mode

Daltonization is a technique for converting indistinguishable colours on websites, videos, and images more identifiable by substituting ambiguous colours with confident colours. Because perplexity and confidence colours range between different types of CVD, this procedure necessitates specialized alterations.

Manual Mode. In manual mode, employing Kotera's daltonization algorithm (Kotera, 2015) improves visual accessibility for CVD including dichromacy and anomalous trichromacy. It transforms the original RGB space into the LMS cone space and uses predefined transformation matrices (filters) to fix colour perception issues. SVG filters simplify image and video processing, while sliders allow users to fine-tune colour contrast and grayscale, improving colour clarity and perception accuracy.

Objective Function of Daltonization in Manual Mode

λOPT​=λSHT​ for ΨOPT​(λOPT​)=maxλSHT​​ΨOPT​(λSHT​)

AI Adaptive Mode. In AI adaptive mode, daltonization is performed through an optimized recoloring algorithm (Huang et al., 2009) with AI training supported by datasets for improved precision. The AI accepts diagnostic test outputs, which provide critical data on deficiency type, severity, and spectral sensitivity. Simulations offer additional insights into user-specific confusion and confident colours. This comprehensive data enables the AI to deliver accurate, personalized recoloring tailored to the unique spectral sensitivities of each user.

Objective Function of Daltonization in AI Adaptive Mode

E=∑i=1K​∑j=i+1K​(λi​+λj​)⋅[DsKL​(Gi​,Gj​)−DsKL​(Sim(Mi​(Gi​)),Sim(Mj​(Gj​)))]2

Datasets Construction

Datasets are generated using a conditional GAN with the Improved Octree Quantization Method (IOQM) and validated through a screening tool as described by Li et al (Li et al, 2020). Video datasets are created by converting videos into frames (images) following a procedure similar to Hongsheng's image dataset creation method, with validation performed accordingly.

Approach 1: Video Collection and Preprocessing.

Videos are collected from open-access repositories, video platforms, educational resources, or through website crawlers. Using openCV, videos are split into frames, extracting every nth frame to balance computational efficiency while preserving essential content for analysis. The aspect ratio maintained at 1920 x 1080 pixels ensures consistency in visual representation and size while frame optimization involves filtering out noise and irrelevant data, retaining relevant features necessary for accurate processing.

Approach 2: Key Colour Extraction and Palette Creation

The Improved Octree Quantization Method (IOQM) (Li et al. 2020) is used to extract key colours from each frame , creating an Extracted Cluster Palette (ECP) , that represents significant colours within a reduced but effective colour space. The ECP is then optimized by integrating spatial and temporal constraints, ensuring smooth colour transitions between frames for consistent recoloring across sequences.

Figure 6: Shows the sample video frame.

Figure 7: Shows the extracted cluster pallet (ECP) from sample video frame using IOQM

Approach 3: Human Evaluation and Cluster Adjustment for Simulation

The ECP is simulated with different CVD filters, e.g. red-green blindness (R-GB), to reflect their specific color perceptions. Real CVDs evaluate both the ECP and the simulated extracted cluster palette (SECP), help to understand their color and shade perceptions. Any colors that do not match their perceptions are identified, rated, and adjusted, with higher ratings indicating larger mismatches. These unmatched colors are replaced with the closest perceived hues to align accurately with CVD perceptions. By utilizing the 1931 confusion lines helps in mapping and identifying colors that appear the same to CVD, intensify the simulation's accuracy.

Figure 8: Represents the perception of red-green colour blindness , the palettes highlighted with their original hue marks. The image is simulated using coblis simulator (Color Blindness, n.d.).

Figure 9: Video frame sample shows the simulated image is marked with number pairs colour confusions, with different hues appearing as low-contrast yellow and identifiable colours marked in red (Color Blindness, n.d.).

Approach 4: Human Evaluation and Cluster Adjustment for Daltonization

Daltonization adjusts the ECP by shifting colors to improve visibility for CVD. ECP and daltonized palettes are reviewed and rated to aid in creating lookup tables (LUTs) for quick recoloring. The evaluation involves three main methods: removing indistinguishable confusion colors, assessing easy-to-recognize colors for better visibility, and identifying universal colors recognized by both CVD and non-CVD users. Employ LUTs facilitate color mapping by replacing confusing hues with clearer options, while human evaluations rate unmatched colors in daltonized outputs to assess perceptual differences.

Approach 5: Frame Reconstruction and Video Assembly

Extensive testing will be conducted with a large sample of individuals with CVD to capture a range of spectral sensitivities and cone response differences, improving the accuracy of simulations and daltonization. Average cone sensitivities are noted as L cone (560nm - red), M cone (530nm - green), and S cone (420nm - blue). Using the pix2pix format, input and output video frames are processed in batches through a Python script with OpenCV and CUDA. FFMPEG encodes frames with applied daltonization or simulation filters, ensuring consistent color mapping for a smooth visual experience.

Labels in Pix2Pix Format

Pix2Pix format uses input-output pairs for efficient batch processing, organizing input videos, cluster palettes, simulated palettes, and daltonized palettes.

Root Directory/

├── Video_Input/

│ ├── input_video_001/

│ │ ├── input_video_001.mp4

│ │ ├── Input_Frames/

│ │ │ ├── input_frame_001.png

│ │ │ ├── input_frame_002.png

│ │ │ └── ...

│ │ ├── Output_Frames/

│ │ │ ├── daltonized_frame_001.png (paired with input_frame_001)

│ │ │ ├── daltonized_frame_002.png (paired with input_frame_002)

│ │ │ ├── simulated_frame_001.png (paired with input_frame_001)

│ │ │ ├── simulated_frame_002.png (paired with input_frame_002)

│ │ │ └── ...

│ │ ├── Cluster_Palettes/

│ │ │ ├── extracted_palette_001.json

│ │ │ ├── extracted_palette_002.json

│ │ │ └── ...

│ │ ├── Daltonized_Cluster_Palettes/

│ │ │ ├── daltonized_palette_001.json

│ │ │ ├── daltonized_palette_002.json

│ │ │ └── ...

│ │ └── Simulated_Cluster_Palettes/

│ │ ├── simulated_palette_001.json

│ │ ├── simulated_palette_002.json

│ │ └── ...

Limitations

Screen accuracy relies on proper display calibration, without accurate calibration, even precise outputs may not appear as intended, affecting user experience. Additionally, achieving real-time synchronization between the front-end user interface and back-end processing is challenging when server-side transformations are complex or time-consuming. Kotera’s daltonization in manual mode effective for only dichromacy and trichromacy, through AI it is possible to accomodate monochromcy.

Affordability

Client-side GPUs reduce server costs by enabling faster, localized processing with improved data privacy, as sensitive data remains on the user's device. However, this requires high-performance hardware, limiting accessibility for low-powered devices. Server-side GPUs centralize processing, reducing user hardware costs but increasing provider expenses and potentially introducing latency and privacy concerns as data is processed externally. A cost-effective solution involves using Java to detect available client-side GPUs and shifting processing accordingly, balancing affordability, performance, and data privacy effectively.

Scalability

Thanks to user-friendly modular design which supports future modules like image segmentation, color detection, and pattern overlay using OpenCV, CNN establishing cross-platform processing and browser compatibility.

Template Reusability

Due to the modular design, templates can be resused or reshape without disturbing the entire system. Communication between modules and AI remains stable during template reuse, regardless of processing side.

Conclusion

This paper introduces a hybrid AI adaptive framework to address challenges faced by individuals with color vision deficiency (CVD) in digital education. The proposed CVD mode enables personalized simulations and recoloring of websites, videos, and images, enhancing accessibility and user engagement. The system integrates manual and AI adaptive modes, leveraging data from online diagnostic tests for precise, user-specific color adjustments. Its modular design allows for reusable templates, while the hybrid structure provides interactive user interfaces, supports low-powered devices, cross-platform compatibility, scalability, and efficient GPU load balancing. Future work involves pattern overlaying to help CVD users independently identify true colors.

References

Bowman, K. J. (1982). A method for quantitative scoring of the Farnsworth Panel D-15. Acta Ophthalmologica, 60(6), 907-916 https://doi.org/10.1111/j.1755-3768.1982.tb00662.x

Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (2009, April). Image Recolorization for the Colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161- 1164). IEEE. https://ieeexplore.ieee.org/abstract/document/4959795

Kotera, H. (2015). A spectral-based color vision deficiency model compatible with dichromacy and anomalous trichromacy. Color and Imaging Conference, 23(1), 127-132. https://doi.org/10.2352/CIC.2015.23.1.art00022

Li, H., Zhang, L., Zhang, X., Zhang, M., & Zhu, G. (2020). Colour vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79(37-38), 27583- 27614. https://doi.org/10.1007/s11042-020-09299-2

Vingrys, A. J., & King-Smith, P. E. (1988). A quantitative scoring technique for panel tests of colour vision. Investigative Ophthalmology & Visual Science, 29(1), 50-63. A quantitative scoring technique for panel tests of color vision. |IOVS | ARVO Journals

Color Blindness. (n.d.). Coblis – colour blindness simulator. Retrieved November 05, 2024, from https://www.colorblindness.com/coblis-color-blindness-simulator/

Daltonien.free.fr. (n.d.). Daltonien: Article on colour vision. November 05, 2024 from, http://daltonien.free.fr/daltonien/article.php3? id_article=6

Daltonize.org. (n.d.).Retrieved November 05, 2024, from Daltonization for colour blindness. http://www.daltonize.org/

Comments
1
Sarah Gulliford (Kearns):

In working with the author ahead of their submission, he said he used AI to help him write some parts of this essay. Declaration of how and where he used AI is necessary.