Researchers Database

Makoto Murakami

    Department of Information Sciences and Arts Associate Professor
    Research Institute of Industrial Technology Researcher
    Course of Information Sciences and Arts Associate Professor
    Center for Computational Mechanics Research Researcher
Last Updated :2025/04/19

Researcher Information

Degree

  • Doctor (Information and Computer Science)(Waseda University)

URL

Research funding number

  • 80329119

J-Global ID

Research Interests

  • Computer Vision   Speech Processing   Pattern Recognition   Human-Computer Interaction   Multimodal Interface   Augmented Reality   Human-Agent Interaction   Human Interface   Human Face Recognition   Image Processing   

Research Areas

  • Informatics / Entertainment and game informatics
  • Informatics / Human interfaces and interactions
  • Informatics / Intelligent informatics

Academic & Professional Experience

  • 2009/04 - Today  Toyo UniversityFaculty of Information Sciences and ArtsAssociate Professor
  • 2007/04 - 2009/03  Toyo UniversityFaculty of EngineeringAssociate Professor
  • 2005/04 - 2007/03  Toyo UniversityFaculty of EngineeringAssistant Professor
  • 2002/04 - 2005/03  Toyo UniversityFaculty of EngineeringLecturer
  • 2000/04 - 2002/03  Waseda UniversitySchool of Science and EngineeringResearch Associate

Association Memberships

  • ACM   IEEE   THE JAPANESE SOCIETY FOR ARTIFICIAL INTELLIGENCE   INFORMATION PROCESSING SOCIETY OF JAPAN   THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS   

Published Papers

Conference Activities & Talks

  • Class-Conditional Human Motion Generation using StyleGAN and Video Classifier
    山本和輝; 村上真
    情報処理学会研究報告(Web)  2024
  • Xiaohan Feng; Makoto Murakami
    Artificial Intelligence, NLP , Data Science and Cloud Computing Technology  2023/08  Academy & Industry Research Collaboration
     
    The Aim of this paper is to explore different ways of using AI to subvert stereotypes more efficiently and effectively. It will also enumerate the advantages and disadvantages of each approach, helping creators select the most appropriate method for their specific situations. AI opens up new possibilities, enabling anyone to effortlessly generate visually stunning images without the need for artistic skills. However, it also leads to the creation of more stereotypes when using large amounts of data. Consequently, stereotypes are becoming more prevalent and serious than ever before. Our belief is that we can use this situation in reverse, aiming to summarize stereotypes with AI and then subvert them through elemental exchange. In this study, we have attempted to develop a less time-consuming method to challenge character stereotypes while embracing the concept of "exchange." We selected two character archetypes, namely the "tyrant" and the "mad scientist," and summarized their stereotypes by generating AI images or asking ChatGPT questions. Additionally, we conducted a survey of real historical tyrants to gain insights into their behavior and characteristics. This step helped us comprehend the reasons behind stereotyping in artwork depicting tyrants. Based on this understanding, we made choices about which stereotypes to retain. The intention was to empower the audience to better evaluate the identity of the character. Finally, the two remaining character stereotypes were exchanged, and the design was completed. This paper documents the last and most time-consuming method. By examining a large number of sources and examining what stereotypical influences were used, we were able to achieve a greater effect of subverting stereotypes. The other method is much less time-consuming but somewhat more random. Whether one chooses by subjective experience or by the most frequent choices, there is no guarantee of the best outcome. In other words, it is the one that best guarantees that the audience will be able to quickly identify the original character and at the same time move the two characters the furthest away from the original stereotypical image of the original. In conclusion, if the designer has sufficient time, ai portrait + research or chatGPT + research can be chosen. If there is not enough time, the remaining methods can be chosen. The remaining methods take less time and the designer can try them all to get the desired result.
  • Xiaohan Feng; Makoto Murakami
    Natural Language Processing, Information Retrieval and AI  2023/02  Academy and Industry Research Collaboration Center (AIRCC)
     
    The Witch is a typical stereotype-busting character because its description has changed many times in a long history. This paper is an attempt to understand the visual interpretations and character positioning of the Watch by many creators in different eras, AI is being used to help summarize current stereotypes in witch design, and to propose a way to subvert the Witch stereotype in current popular culture. This study aims to understand the visual interpretations of witches and character positioning by many creators in different eras, and to subvert the stereotype of witches in current popular culture. This study provides material for future research on character design stereotypes, and an attempt is proposed to use artificial intelligence to break the stereotypes in design and is being documented as an experiment in how to subvert current stereotypes from various periods in history. The method begins by using AI to compile stereotypical images of contemporary witches. Then, the two major components of the stereotype, "accessories" and "appearance," are analyzed from historical and social perspectives and attributed to the reasons for the formation and transformation of the Witch image. These past stereotypes are designed using the design approach of "extraction" "retention" and "conversion.", and finally the advantages and disadvantages of this approach are summarized from a practical perspective. Research has shown that it is feasible to use AI to summarize the design elements and use them as clues to trace history. This is especially true for characters such as the Witch, who have undergone many historical transitions. The more changes there are, the more elements can be gathered, and the advantage of this method increases. Stereotypes change over time, and even when the current stereotype has become history, this method is still effective for newly created stereotypes.
  • Development of an AI model to calculate dietary composition that can induce specific blood amino acids-Toward application to the control of physiological states by diet.
    山中大介; 西村武謙; 増田正人; 西宏起; 合田祐貴; 沖野良輔; 村上真; 宮本崇史; 伯野史彦; 高橋伸一郎; 伊藤公一
    日本分子生物学会年会プログラム・要旨集(Web)  2023
  • Human Motion Generation using StyleGAN
    山本和輝; 村上真
    情報処理学会研究報告(Web)  2023
  • Human motion generative model using StyleGAN
    山本和輝; 村上真
    電子情報通信学会大会講演論文集(CD-ROM)  2023
  • Human Motion Generative Model using Variational Recurrent Neural Network
    Makoto Murakami; Takahiro Ikezawa
    情報処理学会コンピュータグラフィックスとビジュアル情報学研究会  2021/02
  • Human Motion Generative Model using Wasserstein GAN
    Ayumi Shiobara; Makoto Murakami
    IPSJ SIG Technical Report  2020/03
  • Automatic Generation of Training Data for Scene Labeling using Deep Neural Network
    Yuichiro Motegi; Makoto Murakami
    IEICE Technical Report  2019/07
  • Ayumi Shiobara; Makoto Murakami
    IEICE Technical Report  2019/07  ACM
  • Generation of Training Data for Scene Labeling using Neural Network  [Not invited]
    Yuichiro Motegi; Takahiro Suzuki; Yuta Matsuda; Makoto Murakami
    13th World Congress on Computational Mechanics  2018/07 
    Some researchers have proposed neural networks for scene labeling which assigns class label to each pixel of an image. The class accuracy for objects included enough in a training set such as sky or road is high, but the accuracy for objects included less in the training set tends to be low. To improve the average per-class accuracy, we need to increase variations of training images in each class, and reduce the difference in the number of training images/pixels in each class. But it takes time to annotate a lot of images at the pixel level manually to make training set. There are some researches to generate many pixel-level annotated images using computer graphics. We propose the method to generate large training set for scene labeling. In our system users register 3D models and the corresponding class labels. And the users input some constraints of the object positions such as "tables must be on the floor" or "cups must be on tables or desks". And the users input some hyper parameters for training neural network, such as the structure of the network, loss function, etc. The system put some objects in the virtual space based on the constraints, and generates a large set of images labeled at each pixel, to increase the variations of images in each class and to flatten the number of pixels in each class. The neural network is trained using the large set of generated labeled images. And the system repeats training and evaluating while increasing the variations of training images, and output the network parameters which minimize the generalization error measured with the average per-class accuracy. We captured some real images of indoor environment, and labeled each pixel of them using the neural network trained with our system. To evaluate our system we changed some settings of the system and compared the obtained results.
  • Object Shape Feature Extraction from Motion Parallax using Convolutional Neural Network  [Not invited]
    ChengJun Shao; Makoto Murakami
    13th World Congress on Computational Mechanics  2018/07  Oral presentation 
    We propose a neural network which can recognize objects from a sequence of RGB images captured with a single camera through two different convolutional neural networks. The learning process is divided into two steps: learning of CNN for spatial feature extraction and learning of CNN for spatiotemporal feature extraction. The spatial feature extraction CNN extracts spatial feature vectors with position invariance. And they are input to the following spatiotemporal feature extraction CNN, which convolutes them temporally to achieve depth information based on motion parallax. In the spatial feature extraction CNN, each frame of image sequence is convoluted with some spatial filters, the convoluted values are passed through an activation function, and some spatial features are extracted in the convolutional layer. The features are input to the local contrast normalization layer, and the following pooling layer for downsampling. With these three layers as a set, three sets of layers are concatenated to extract low, medium, and high level spatial features. Then, the high level features are converted to a one-dimensional vector, and weighted sums of elements of it are passed through an activation function in the fully connected layer. We may use dropout to reduce the degree of freedom of the network, and to prevent overfitting. In the spatiotemporal feature extraction CNN, a sequence of the low and medium spatial features extracted in the spatial feature extraction CNN with a frame length T is input to the convolutional layer. The sequence of the same spatial features is convoluted with some temporal filters, the convoluted values are passed through an activation function, and some temporal features including depth information from motion parallax can be extracted. The features are input to the local contrast normalization layer, the pooling layer, and the fully connected layer. And the high level spatial features extracted in the spatial feature extraction CNN are also input to the fully connected layer. And these different kinds of features are integrated in the output layer. To evaluate our proposed method we conducted an experiment using some objects with simple shapes, and extracted the shape information from motion parallax.
  • Feature extraction of object shape from motion parallax using convolutional neural networ  [Not invited]
    ChengJun Shao; Makoto Murakami
    IEICE Technical Report, PRMU2017-177  2018/03  Oral presentation
  • Activation of Dialogue Control for Information Collection Chatbot System  [Not invited]
    Yuichiro Motegi; Makoto Murakami
    2018年電子情報通信学会総合大会,情報・システム講演論文集1,D-5-10  2018/03  Oral presentation
  • Recognition of cutting and mixing using arms motion  [Not invited]
    Yuma Hijioka; Makoto Murakami; Kimoto Tadahiko
    The 2017 IEICE General Conference  2017/03  Oral presentation
  • 2011/03 
    我々はユーザと能動的にコミュニケーションを取り,感想情報のような再利用する価値のある情報を言語情報として所得するシステムの実現を目指している.このようなシステムにおいては,システムがしつこく情報収集を続けてしまうと次回からの情報収集に支障をきたす恐れがあるため,ユーザが対話したくないことをシステムが判断する必要がある.そこで本研究では,音声情報を用いて対話終了を判断するためのモデル構築を行った.
  • OTSUKA Naoki; MURAKAMI Makoto; YAMAGIWA Motoi; UEHARA Minoru
    情報科学技術フォーラム講演論文集  2010/08
  • 端 千尋; 宮島 崇浩; 村上 真
    言語・音声理解と対話処理研究会  2010/02
  • 山際 基; 上原 稔; 村上 真; 米山 正秀; ヤマギワ モトイ; ウエハラ ミノル; ムラカミ マコト; ヨネヤマ マサヒデ; Motoi Yamagiwa; Minoru Uehara; Makoto Murakami; Masahide Yoneyama
    Industrial technology  2009
  • Tanaka Kenichi; Uehara Minoru; Murakami Makoto; Yamagiwa Motoi
    IPSJ SIG Notes  2008/11 
    Recently, according to increases hardware performance, the resources of each PC are enlarged. But, those resources are not used entirely. There are many free or idle parts in those resources. We are studying on controlling full grid on Windows PC by using Virtual Machine. This study aims to utilize idle resources of Windows PC using Linux Grid on Virtual Machine (Virtual Grid). And we have developed Meta Grid that controls Virtual Grid. In this paper, we propose a method of using Web Storage for Meta Grid. IN this method, Virtual Grid can use massive Web Storage through NBD. As this result,...
  • Gomi Mariko; Murakami Makoto; Yoneyama Masahide
    情報科学技術フォーラム一般講演論文集  2006/08
  • Gomi Mariko; Murakami Makoto; Yoneyama Masahide
    Proceedings of the IEICE General Conference  2006/03
  • Maruyama Sachiko; Rin Chiei; Murakami Makoto; Yoneyama Masahide
    Proceedings of the IEICE General Conference  2005/03
  • AOKI Koushirou; YAMAGIWA Motoi; MURAKAMI Makoto; YONEYAMA Masahide
    日本音響学会研究発表会講演論文集  2004/03
  • ONO Yukimasa; MARUYAMA Sachiko; MURAKAMI Makoto; YONEYAMA Masahide
    電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解  2003/11
  • ONO Yukimasa; MARUYAMA Sachiko; MURAKAMI Makoto; YONEYAMA Masahide
    Technical report of IEICE. PRMU  2003/11 
    W e applied Labeled Graph Matching used for object recognition of still picture to the moving image. It became possible that the process sequentially dealt with until now by this technique was integratedly processed. However, it is necessary that the recognition accuracy and processing time consider the control method, since it is affected in feature quantity control parameter. In this paper, we carried out the analysis on the parametric control method for optimizing the extraction accuracy by applying the proposal technique to the problem of the head region extraction.
  • ONO Yukimasa; MARUYAMA Sachiko; MURAKAMI Makoto; YONEYAMA Masahide
    Technical report of IEICE. HIP  2003/11 
    W e applied Labeled Graph Matching used for object recognition of still picture to the moving image. It became possible that the process sequentially dealt with until now by this technique was integratedly processed. However, it is necessary that the recognition accuracy and processing time consider the control method, since it is affected in feature quantity control parameter. In this paper, we carried out the analysis on the parametric control method for optimizing the extraction accuracy by applying the proposal technique to the problem of the head region extraction.
  • Aoki Koushirou; Yamagiwa Motoi; Murakami Makoto; Yoneyama Masahide
    Proceedings of the Society Conference of IEICE  2003/09
  • Tamaru Masazumi; Murakami Makoto; Sugimoto Futoshi; Yoneyama Masahide
    Proceedings of the Society Conference of IEICE  2003/09
  • 電子情報通信学会総合大会講演論文集  1999/03
  • ポップフィールドニューラルネットワークを用いた表情に依存しない個人認識  [Not invited]
    電子情報通信学会総合大会講演論文集  1999
  • 人工知能学会全国大会論文集  1998

MISC

Research Grants & Projects

  • 日本学術振興会:科学研究費助成事業
    Date (from‐to) : 2022/04 -2025/03 
    Author : 村上 真
  • Japan Society for the Promotion of Science:Grants-in-Aid for Scientific Research
    Date (from‐to) : 2019/04 -2022/03 
    Author : Murakami Makoto
     
    We consider that the process that people create various human motions in their minds and the process that people recognize various human motions are complicated and non-linear. And we modeled them using two different kinds of deep neural networks: generative adversarial networks and variational autoencoders. We trained the proposed models using human motion dataset captured with optical motion capture system. And we confirmed that the trained models can generate various natural human motions.
  • Japan Society for the Promotion of Science:Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)
    Date (from‐to) : 2008 -2011 
    Author : YUBUNE Eiichi; MURAKAMI Makoto
     
    In order for Japanese learners of English to acquire native-like sounding natural speech, we attempted to develop a computer program which recognizes and evaluates the extent to which learners' phonetic realization is either close or distant against the teacher speech. We used a DP matching method to make the evaluation program of coalescent assimilation and linking of English sounds embedded in the sample sentences. As the result of comparing the program's evaluations with the human evaluations, considerably high validity and correlation were found, suggesting our automatic recognition and evaluation system will be of use. We also conducted a research on which feedback modality will best help the learner to realize the important aspects of pronunciation in producing better English rhythm. We tested three different modalities against experimental subjects : visual, auditory, and linguistic. As the result, the auditory feedback tended to be best recognized among the three.
  • Japan Society for the Promotion of Science:Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)
    Date (from‐to) : 2002 -2004 
    Author : SHIRAI Katsuhiko; KOBAYASHI Tetsunori; YONEYAMA Masahide; YAMAZAKI Yoshio; OHIRA Shigeki; MURAKAMI Makoto
     
    To clarify how emotion appears in speech sound, we analyzed it using rakugo comic stories speech data, which is a kind of the most natural and emotional speech data. As a result the variance in speech sound with emotion mainly appears at the end of utterance. Then we focused on laughing voice as an emotion representation in physiological function. As a result of its analysis f0 frequency and phoneme timing are the fundamental features to perceive the voice as laughing. Then to generate motion with emotion from language instructions we constructed emotion representation model, in which the relation between the emotional words and the motion is described as a binary tree. Then we implemented the virtual actor system, which consists of the emotion representation component to generate target motion from language instructions using the emotion representation model, and the emotion learning component to update the emotion representation model when an unknown word is given. As a result of the evaluation experiment our system generates the appropriate motion with emotion. Finally, in order to clarify the relation between the signals of video and sound and the emotion which we perceive from them, we analyzed it using the visual and speech data with emotion. As a result speakers represent emotional level by the change of not facial expression but voice. At the same time listeners recognize the kind of emotion from the speakers' facial expression, and perceive the level of the emotion from the speakers' voice.
  • Japan Society for the Promotion of Science:Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)
    Date (from‐to) : 2001 -2003 
    Author : YASUYOSHI Itsuki; MCFARLAND Curtis; SAKURAI Toshiko; MURAKAMI Makoto
     
    Since broadband Internet connections have become widely available, multimedia educational software is estimated to be of great importance for the immediate future. PDAs and mobile phones owned by many young people are considered to be as effective as PCs in near-future language learning. We made a comparative study of PCs, PDAs, and mobile phones in terms of usability, effectiveness, and interactivity in language learning. PDAs were judged to be more useful than mobile phones in the 2001-2 experiment. Students studied in a multimedia English program using the three different environments (eight students per PC, PDA, and mobile phone) and afterwards took exams. PCs turned out to be very dependable, but PDAs showed a great potential for language learning because of their portability and unique functions. Our research also focused on how to develop learning materials using the Internet. We made three different movies for our collaborative experiment in creating educational software on the web. The traditional way of making educational software is for programmers to design, produce, and edit learning materials in one place together with authors and editors, but in this research each team member could occupy a different site on the web and still collaborate simultaneously in the production of software. Commuting time was saved because the participants in this collaboration did not have to meet physically at one place. In order to decide whether the text used for educational software is appropriate for students, we used text-analyzing programs and the English text database : the levels of the texts were decided by the number of their vocabulary words. We added another one thousand texts (700 megabytes) to the present English Text Database to extract more sample sentences in textual analysis. We also came to the conclusion that at least 4000 vocabulary words were necessary for college students to understand the average English newspaper and its equivalents, although Japanese high-school graduates learn only two thousand words at high school. We also collected and analyzed TOEFL and various English tests to make a prototype of CBT (Computer-Based Testing). This program was designed to evaluate students' English language abilities automatically pointing out their strengths and weaknesses based on grammar and usage data stored in the databases.
  • 日本学術振興会:科学研究費助成事業 特定領域研究(A)
    Date (from‐to) : 2001 -2001 
    Author : 安吉 逸季; 村上 真; 櫻井 敏子; マクファーランド カーティス
     
    インターネット上での教育ソフト制作におけるコラボレーション 今後急速にブロードバンドが一般化すると考えられ、インターネット上で教育のソフト制作の共同制作をどのように行うか、この課題をNTTラーニング・早稲田大学国際情報通信センターの教授・研究員との共同で研究した.また、異なる3つの受講環境(PC, PDA, FOMA)で、制作した教材を受講した被験者からのデータを得、今後の研究用データとした. 研究課題 (1)コンテンツ制作における編集・修正・議論をウェッブ上にて行い、そこでテキスト・静止画・デザイン等の編集を試みた.ウェッブ上の複数地点から同一教材制作のコラボレーション実験を行い、移動・待ち時間・編集・打ち合わせ等の項目ごとに、従来型の開発制作方法との対比をした.制作に実際にかかる編集時間や打ち合わせよりも、移動・待ち時間の方が、断然大きいという結果をえた.今後の教育ソフト制作に一つの指針を与えてくれた. (2)コンテンツは、ウェッブ上で学習できるマルチメディア教材を制作した.テーマとしてはスコットランド地方で作られるスコッチ・ウィスキーを取り上げ、全3章で構成している.各章は4〜7ページのHTMLファイルと4分〜6分のビデオから成る(PartIスコットランドの首都エディンバラ,PartIIスコッチウィスキーの歴史,PartIIIウィスキー製造の工程).各ページは1ないし2パラグラフで構成している.教材を全て閲覧すると、約15分程度となる.ただし,受講端末により表示条件などが異なるため、それぞれコンテンツは異なる. (3)英語教材の評価方式とその実験、ウェッブ上での試験と評価方式の実験を行った.