Tuesday, 13 March 2012

HCI and Multimedia

In the past decades, usability was the antecedent research area in HCI. It often finds the balancing of interaction between human and computer. Thus, it is more on human adaptive to the machine. With the emergence of multimedia, a multi-modal information from senses, HCI has changed their focuses. It is no longer for HCI to find the balance rather HCI is now more biasing towards human-oriented and -specialty.  Furthermore, HCI is now more on simplicity. Simplicity has a very broad sense. It often includes user-friendly, natural and etc.

Multimedia system is tightly connected to human perceptual system. In fact, human beings are amazing multimedia system.  Generally, human perceptual system is composed of visual, acoustical, haptic, taste, and smell sense.  This forms the basic consideration for designing a multimedia system. Multimedia system is thus defined as a system that can receive and process multi-modal information from those senses and produce desired multimedia output effortlessly. Multi-modal information contains high-level abstract details produced by human such as sound, music, speech, gesture, reading, writing and etc. Thus, coordination of interaction became an important issue.

Being simplicity or naturally for human to interact, the system must be complex in handling information. This forms the trade-off between HCI and multimedia system. The following sections provide an overview of current and/or future applications that required minimum interaction yet a powerful and desirable system from user point of view.

Information Processing System
With the advancing on internet and hypertext technology, information is widely available and being directly interacts to user. One such example is searching (Information retrieval). There are five basic categories for this system, namely: free text system, information retrieval system, information extraction system, questioning and answering system, dialog system and natural content processing system, [3]. The interaction for those systems is often restricted as a form of simplicity (Or we can say that user tends to be ‘lazy’). The terms ‘relevance’ is often concerns by user when they interact to the system. Thus, the output of information processing system must have certain confidence level about the relevance detail for the particular user requirement. Sometimes, relevance can be referred as rank. This is a primary concerns for a search engine where the interaction is often easy and simple, type in query or even supply with an image.
Technically, a text-based content retrieval system consists of a relevance-feedback-term-based analyzer which in turns consists of term selection algorithm, stemming algorithm, similarity measure, vector space model and latent semantic analysis, [3]. While, an image-based content retrieval system consists of series or single technique found in the discipline of computer vision and image processing. Such technique can be color histogram [3], color coherent vector model [3], color correlogram [3], saliency detection [3], edge detection model [2][4], mathematic morphological model [2], automatic seeded region growing[2] and a lot more.

Speech Processing
Speech is a natural form of communication between human and it reflects the variability and complexity of humans. Speech processing is the process aiming at modeling and manipulating the speech signal to be able to transmit, produce and recognize, [1]. There are a lot applications involving speech processing such as information inquiry system, voice control system, voice synthesis system, audio-book and etc. The interaction of this kind of system is more simplify and natural.
Technically, a speech processing system is based on hidden Markov model (HMM). A simple architecture is shown below.




Digital face beautification
Digital face beautification is a new developing research area and it often required image processing technique (sometimes, it qualify as computational photography). Nowadays, image processing methods for computational photography are of paramount importance in the research and development community.  This field is mainly involved human visual sense yet an interesting and potentially commercial successful application.
Technically, a digital face beautification system involved machine learning, face detection, facial feature detection and image warping, [4]. Two common machine learning methodologies are applied in this field: K-nearest neighbor (KNN) based and support vector machine (SVM) based.
Each of those technical terms mentioned above is hardly to understand in one-shot (sometimes, it takes months to understand !!). Perhaps, you will perceive that there is no link to HCI. In fact, those technical details are emphasis the simplicity of interaction between user and computer by adding more abstraction, or complexity, to the system. Yet, it also represents the transition from single-user based to multi-user (social community) based interaction. Thus, the trend for HCI in multimedia system is going to be simplicity and natural.

References
[1] CS5241 Speech Processing, AY2010/2011 Semester 2, NUS, SOC
[2] CS4243 Computer Vision and Pattern Recognition, AY2011/12 Semester 1, NUS, SOC
[3] CS5342 Multimedia Computing and Applications, AY2011/12 Semester 2, NUS, SOC
[4] CS5341 Computational Photography, AY2011/12 Semester 2, NUS, SOC

No comments:

Post a Comment