close_btn
로그인, 회원가입후 더 많은 혜택을 누리세요 로그인 회원가입 닫기

Paper Link.

 

https://arxiv.org/abs/1711.02578

 

<Main Reference Paper : Show and Tell : A Neural Image Caption Generator>

https://arxiv.org/abs/1411.4555

 

OCtavio Arriaga, Paul. G. Ploger, Matias Valdenegro

 

Image Captioning -> Anomaly Situations (Dangerous Situations)

 

Alert/Non-Alert Classification Accuracy 97.%

Alert Image Description Accuracy (Meteor score) 16.2

 

 

1. Introduction

 

Classification as an anomaly

1. broken windows

2. injured people

3. fights

4. explosions

5. car accidents

6. fire accidents

7. guns

8. domestic violence

 

 

Use complete sentence than only categorized classification

 

1) New anomaly detection dataset

2) classify and describe a dangerous situation.

 

2. Related Work

 

A. Image Captioning

  1. classify, describe the environment : one of the hardest open problems in ai
    1. computer vision and natural language processing.
    2. DL have overcome this mission[15]
  2. [15] show and tell : lessons learned from the 2015 MSCOCO Image Captioning Challenge
    1. Trainable end-to-end
    2. LogJoingProbability of getting sequence of words(S = {S0,...sN} conditioned on Input Image (I) and parameter(\theta)
    3. Neural Image Caption :
      Encode Image with CNN / Generate sentence with RNN
    4. CNN : GoogLeNet(-last layer) : 2048 nodes
      1. Input img -<CNN>- <LSTM> - only once.
      2.  

B. Anomaly Detection

    - evente that could prove potentially dangerous for humans

 

3. Anomaly Detection Dataset

- 1008 captioned images  corr. to anomalies

- Situations re listed above.

- : Purpose on robot activities such : patrolling or domestic-service

- Creative Commons(CC). 

  - Attribute Original authors, non-commercial agreement.

  - captioned by 20 persons.

 

Rules of captioning images.

  1. siingle English sentence for each image.
  2. present, present continuous tense.(tobe, ~ing)
  3. write primarily about the accident/incident/anomaly in the image.
  4. explicit with the number of persons in the image.
  5. No digits(1,2,3..), written numbers(two, three, four...)
  6. explicit with gender(man, woman)
  7. Length of sentence : 7~18 words recommended

 

4. Model

  1.  [15] + Classification Module
  2. Training Datasets : IAPR-2012([6] 20000 captioned images, copyright free)
    / AD(자체 제작, 위험상황) datatset
  3. Removed
    - alphanumeric
    - tolower
    - less than 3 words
    - longer than 14 words : Anomalous sit. in concise manner.
  4. Vocabulary Size : 1078
  5. 299*299 Image <CNN> 2048 features.
  6. -> {<MLP - binary, 2HLrs> , <LSTM - 512,512 neurons, imgEnbeddings > }
  7. 80/20 train/validation

5. Metrics

 - BLEU

 - METEOR[1]

  1.   word similarity - Exact / Stem/ Synonym
  2.   Order : Crossings
  3.   Chunk : #Broken

 

6.Results

 

Positive : Non-anomaly

Negative : Anomaly

 

 Well Describe the situation in short words

Applicable into the robot patrolling

Soft-Rep <-> Hard evidence.

METEOR after several epochs after Loss Minimum

-> Synomym Matching algorithm. : uniform dist. on synomym word vectors

번호 제목 글쓴이 날짜 조회 수
공지 게시판 자료 복구의 건 음파선생 2017.09.11 51
104 matrix capsules newfile 이동훈 2017.12.18 2
103 deep voice 3 file yhlee 2017.12.13 19
102 [논문반] Enhanced Deep Residual Networks for Single Image Super-Resolution file atoz 2017.12.06 33
101 [논문반] Generative Adversarial Networks: An Overview file ilguyi 2017.12.06 559
100 [PR12] Conditional Generative Adversarial Nets file 음파선생 2017.12.03 50
99 [논문반] Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates file 정자민 2017.11.29 28
» [논문반] Image Captioning and Classification of Dangerous Situations June2374 2017.11.22 49
97 [논문반] Deep Learning for Event-Driven Stock Prediction file 양종열 2017.11.20 620
96 [논문반] DR(eye)VE Project: Predicting Drive's Focus of Attenttion file 도니리 2017.11.15 119
95 look, listen and learn file yhlee 2017.11.13 36
94 [논문반]Spectral Graph Convolutions for Population-based Disease Prediction file Dragon 2017.11.07 45
93 [논문반] Learning From Noisy Large-Scale Datasets With Minimal Supervision 저게뭐니원두막 2017.10.31 51
92 [논문반] Learning by Association_A versatile semi-supervised training file kmsqwet 2017.10.30 36
91 [논문반] InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets file 제발쫌제발 2017.10.24 66
90 [논문반]A Tutorial on Deep Learning for Music Information Retrieval file 음파선생 2017.10.23 432
89 [논문반] EraseReLU: A Simple Way to Ease the Training of Deep Convolution Neural Networks file krust 2017.10.16 86
88 [PR12] WaveNet - A Generative Model for Raw Audio file 음파선생 2017.10.15 168
87 [논문반] Faster-RCNN file 박상진 2017.09.25 163
86 Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation file 이동훈 2017.09.25 72
85 [논문반] Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge file ThinkingToy 2017.09.20 68