close_btn
로그인, 회원가입후 더 많은 혜택을 누리세요 로그인 회원가입 닫기

Paper Link.

 

https://arxiv.org/abs/1711.02578

 

<Main Reference Paper : Show and Tell : A Neural Image Caption Generator>

https://arxiv.org/abs/1411.4555

 

OCtavio Arriaga, Paul. G. Ploger, Matias Valdenegro

 

Image Captioning -> Anomaly Situations (Dangerous Situations)

 

Alert/Non-Alert Classification Accuracy 97.%

Alert Image Description Accuracy (Meteor score) 16.2

 

 

1. Introduction

 

Classification as an anomaly

1. broken windows

2. injured people

3. fights

4. explosions

5. car accidents

6. fire accidents

7. guns

8. domestic violence

 

 

Use complete sentence than only categorized classification

 

1) New anomaly detection dataset

2) classify and describe a dangerous situation.

 

2. Related Work

 

A. Image Captioning

  1. classify, describe the environment : one of the hardest open problems in ai
    1. computer vision and natural language processing.
    2. DL have overcome this mission[15]
  2. [15] show and tell : lessons learned from the 2015 MSCOCO Image Captioning Challenge
    1. Trainable end-to-end
    2. LogJoingProbability of getting sequence of words(S = {S0,...sN} conditioned on Input Image (I) and parameter(\theta)
    3. Neural Image Caption :
      Encode Image with CNN / Generate sentence with RNN
    4. CNN : GoogLeNet(-last layer) : 2048 nodes
      1. Input img -<CNN>- <LSTM> - only once.
      2.  

B. Anomaly Detection

    - evente that could prove potentially dangerous for humans

 

3. Anomaly Detection Dataset

- 1008 captioned images  corr. to anomalies

- Situations re listed above.

- : Purpose on robot activities such : patrolling or domestic-service

- Creative Commons(CC). 

  - Attribute Original authors, non-commercial agreement.

  - captioned by 20 persons.

 

Rules of captioning images.

  1. siingle English sentence for each image.
  2. present, present continuous tense.(tobe, ~ing)
  3. write primarily about the accident/incident/anomaly in the image.
  4. explicit with the number of persons in the image.
  5. No digits(1,2,3..), written numbers(two, three, four...)
  6. explicit with gender(man, woman)
  7. Length of sentence : 7~18 words recommended

 

4. Model

  1.  [15] + Classification Module
  2. Training Datasets : IAPR-2012([6] 20000 captioned images, copyright free)
    / AD(자체 제작, 위험상황) datatset
  3. Removed
    - alphanumeric
    - tolower
    - less than 3 words
    - longer than 14 words : Anomalous sit. in concise manner.
  4. Vocabulary Size : 1078
  5. 299*299 Image <CNN> 2048 features.
  6. -> {<MLP - binary, 2HLrs> , <LSTM - 512,512 neurons, imgEnbeddings > }
  7. 80/20 train/validation

5. Metrics

 - BLEU

 - METEOR[1]

  1.   word similarity - Exact / Stem/ Synonym
  2.   Order : Crossings
  3.   Chunk : #Broken

 

6.Results

 

Positive : Non-anomaly

Negative : Anomaly

 

 Well Describe the situation in short words

Applicable into the robot patrolling

Soft-Rep <-> Hard evidence.

METEOR after several epochs after Loss Minimum

-> Synomym Matching algorithm. : uniform dist. on synomym word vectors

번호 제목 글쓴이 날짜 조회 수
공지 [공지] 논문 발표 리스트 음파선생 2018.01.26 2532
공지 게시판 자료 복구의 건 음파선생 2017.09.11 214
175 [논문반]Deep Learning for undersampled MRI reconstruction file Dragon 2018.01.23 108
174 [논문반]Connecting Generative Adversarial Networks and Actor-Critic Methods file 음파선생 2018.01.08 219
173 [논문반] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation file 도니리 2018.01.08 289
172 matrix capsules file 이동훈 2017.12.18 103
171 deep voice 3 file yhlee 2017.12.13 263
170 [논문반] Enhanced Deep Residual Networks for Single Image Super-Resolution file atoz 2017.12.06 204
169 [논문반] Generative Adversarial Networks: An Overview file ilguyi 2017.12.06 800
168 [PR12] Conditional Generative Adversarial Nets file 음파선생 2017.12.03 821
167 [논문반] Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates file 정자민 2017.11.29 102
» [논문반] Image Captioning and Classification of Dangerous Situations June2374 2017.11.22 151
165 [논문반] Deep Learning for Event-Driven Stock Prediction file 양종열 2017.11.20 765
164 [논문반] DR(eye)VE Project: Predicting Drive's Focus of Attenttion file 도니리 2017.11.15 226
163 look, listen and learn file yhlee 2017.11.13 81
162 [논문반]Spectral Graph Convolutions for Population-based Disease Prediction file Dragon 2017.11.07 142
161 [논문반] Learning From Noisy Large-Scale Datasets With Minimal Supervision 저게뭐니원두막 2017.10.31 129
160 [논문반] Learning by Association_A versatile semi-supervised training file kmsqwet 2017.10.30 76
159 [논문반] InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets file 제발쫌제발 2017.10.24 166
158 [논문반]A Tutorial on Deep Learning for Music Information Retrieval file 음파선생 2017.10.23 581
157 [논문반] EraseReLU: A Simple Way to Ease the Training of Deep Convolution Neural Networks file krust 2017.10.16 139
156 [PR12] WaveNet - A Generative Model for Raw Audio file 음파선생 2017.10.15 498