Quora Question Pairs 加入小组

42个成员 11个话题 创建时间:2017-11-30

论文《A Decomposable Attention Model for Natural Language Inference》

发表于2018-05-27 3148次查看

这篇文章所提的模型比较简单,但是效果的确不错。
文章链接:https://arxiv.org/abs/1606.01933

3回复
  • 2楼 Ewan 2018-06-04

    刚刚才看到老师发的SNLI数据集和glove embedding vector的帖子,数据处理和embedding vector这一块还没有写,只是断断续续写了模型这一块代码,请老师和大家一起指正和学习

    代码如下:

    import tensorflow as tf
    import numpy as np
    
    import sys
    
    keep_prob = tf.placeholder(tf.float32)
    #input 长度为la的句子
    x1 = tf.placeholder(tf.float32, [None, 300])
    #input 长度为lb的句子
    x2 = tf.placeholder(tf.float32, [None, 300])
    y = tf.placeholder(tf.float32, [None, 3])
    
    # 定义Attend层的前馈神经网络
    #Weights_L1 = tf.Variable(np.random.normal(0, 0.01,[300, 200]))
    Weights_L1 = tf.Variable(tf.truncated_normal([300, 200], mean= 0, stddev= 0.01), dtype= tf.float32)
    biases_L1 = tf.Variable(tf.zeros([1, 200], dtype=tf.float32))
    Wx1_plus_b_L1 = tf.nn.relu(tf.matmul(x1, Weights_L1) + biases_L1)
    Wx2_plus_b_L1 = tf.nn.relu(tf.matmul(x2, Weights_L1) + biases_L1)
    Wx1_L1_droput = tf.nn.dropout(Wx1_plus_b_L1,keep_prob)
    Wx2_L1_droput = tf.nn.dropout(Wx2_plus_b_L1,keep_prob)
    L1 = tf.matmul(Wx1_L1_droput, tf.transpose(Wx2_L1_droput))
    E1 = tf.nn.softmax(L1)
    E2 = tf.nn.softmax(tf.transpose(L1))
    B = tf.matmul(E1, x2)
    A = tf.matmul(E2,x1)
    
    # 定义compare层的前馈神经网络
    C1 = tf.concat([x1,B], 0)
    C2 = tf.concat([x2,A], 0)
    Weights_L2 = tf.Variable(tf.truncated_normal([600, 200], mean= 0, stddev= 0.01, dtype = tf.float32)
    biases_L2 = tf.Variable(tf.zeros([1, 200], dtype=tf.float32))
    Wx1_plus_b_L2 = tf.matmul(C1, Weights_L2) + biases_L2
    Wx2_plus_b_L2 = tf.matmul(C2, Weights_L2) + biases_L2
    
    #定义 aggregate层的前馈神经网络
    v1 = tf.reduce_sum(Wx1_plus_b_L2, 0)
    v2 = tf.reduce_mean(Wx2_plus_b_L2, 0)
    #TODO v1与v2的拼接
    v = tf.concat([v1, v2], 0)
    Weights_L3 = tf.Variable(tf.truncated_normal([400, 3], mean= 0, stddev= 0.01))
    biases_L3 = tf.Variable(tf.zeros([1, 3], dtype=tf.float32))
    Wv_plus_b_L3  = tf.matmul(v, Weights_L3) + biases_L3
    output = tf.nn.softmax(Wv_plus_b_L3)
    
    # 交叉熵代价函数
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=output,labels=y))
    
    
    # 优化方式
    train = tf.train.AdagradOptimizer(0.05).minimize(loss)
    
    #正确率预估
    
    correct_predict = tf.equal(tf.argmax(y, 1), tf.argmax(output,1))
    accuary = tf.reduce_mean(tf.cast(correct_predict, tf.float32))
    
    
    
    
    
    
    
    

     

    • Ryan 2018-06-07
      这里输入的300维是怎么确定的?句子的长度固定为300? 另外,写代码时注意模块化。每块独立的功能用函数或者类来实现,这样无论是自己在写单元测试的时候,还是重新看代码的时候,都会好容易和清晰很多。
    • wangsy 2018-06-07
      回复 Ryan:老师 ,我的想法是 vector 的维度是300,就是一个词的维度,一次的输入数据是两个矩阵分别代表两个句子 维度是 La * 300,和Lb * 300 La ,Lb 分别为句子长度, 具体代码正在参考垃圾短信识别的代码。
    • Ewan 2018-06-07
      回复 Ryan:嗯嗯,我正在改自己的代码书写规范,我的输入矩阵是a句子的长度La * 每个词的维度300,这就是构建输入的数据矩阵,我正在用glove.6B.300d.txt改数据的处理部分
    • Ryan 2018-06-12
      回复 wangsy: 这种做法也可以。每次输入一个样本,后续可以考虑mini-batch怎么实现
  • 3楼 wangsy 2018-06-05

    老师,电脑硬盘坏了 ,暂时编译不了,编写时一些相关的函数不太熟悉,参考了楼上的函数,glove 没有接触过相关代码,暂时不知道如何开始写,模型大致代码如下 ,错误请老师指正

    #coding: utf-8
    #author: wsy
    import tensorflow as tf
    import numpy as np
    
    def add_layer(inputs,in_size,out_size,activation_function=None):
    	Weights = tf.Variable(tf.random_normal([in_size, out_size]))
        biases = tf.Variable(tf.zeros([1, out_size]) + 0.1,)
        Wx_plus_b = tf.matmul(inputs, Weights) + biases
        if activation_function is None:
            outputs = Wx_plus_b
        else:
            outputs = activation_function(Wx_plus_b,)
        return outputs
    	
    	
    # define placeholder for inputs to network
    xa_input = tf.placeholder(tf.float32, [None, 300]) # 300
    xa_input = tf.placeholder(tf.float32, [None, 300])
    ys = tf.placeholder(tf.float32, [None, 3])
    
    # add output layer
    fa_out = add_layer(xa_input, 300, 200,  activation_function=tf.nn.relu)
    fb_out = add_layer(xa_input, 300, 200,  activation_function=tf.nn.relu)
    # get eij
    e_ab  = tf.matmul(fa_out, tf.transpose(fb_out))
    # normalized 
    d_biases = tf.Variable(tf.zeros([fa_out.shape[0], fb_out.shape[0]]) + 0.1,)
    e_a = tf.nn.softmax(e_ab+d_biases)
    e_b = tf.nn.softmax(tf.transpose(e_ab)+d_biases)
    # (xa_input,a)  (xb_input,b)
    a = tf.matmul(e_a,xa_input)
    b = tf.matmul(e_b,xb_input)
    
    # G(xa_input,a)  G(xb_input,b)
    compare_input1 = tf.concat(0,[xa_input,a])
    compare_input2 = tf.concat(0,[xb_input,b])
    compare_out1 = add_layer(compare_input1, 600, 200,  activation_function=None)
    compare_out2 = add_layer(compare_input2, 600, 200,  activation_function=None)
    
    # Aggregate
    v1 = tf.reduce_sum(compare_out1,reduction_indices=[0])
    v2 = tf.reduce_sum(compare_out2,reduction_indices=[0])
    v_input = tf.concat(0,[v1,v2])
    prediction = add_layer(v, 400, 3,  activation_function=tf.nn.softmax)
    
    cross_entropy = tf.reduce_mean(-tf.reduce_sum(ys * tf.log(prediction),
                                                  reduction_indices=[1]))       # loss
    train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
    
    sess = tf.Session()
    # important step
    sess.run(tf.initialize_all_variables())
    
    
    
    
    E1 = tf.nn.softmax(L1)
    E2 = tf.nn.softmax(tf.transpose(L1))
    B = tf.matmul(E1, x2)
    A = tf.matmul(E2,x1)
    
    # the error between prediction and real data
    cross_entropy = tf.reduce_mean(-tf.reduce_sum(ys * tf.log(prediction),
                                                  reduction_indices=[1]))       # loss
    train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
    
    sess = tf.Session()
    # important step
    sess.run(tf.initialize_all_variables())

     

发表回复
你还没有登录,请先 登录或 注册!