Quora Question Pairs 加入小组

42个成员 11个话题 创建时间:2017-11-30

Decomposable模型的实现

发表于2018-06-09 1267次查看

这里实现Decomposable模型,原始文章见:http://www.jiehuozhe.com/group/6/thread/15

6回复
  • 2楼 Ryan 2018-06-09

    输入部分的代码实现如下所示:

  • 3楼 Ryan 2018-06-09

    Attend实现的部分代码:

    这里 _transform_attend 为前向的神经网络,可以多层。

    utils.text.mask3d 将一个句子padding部分设置为 -numpy.inf,这样在计算softma值时,padding部分对权值没有影响。

  • 4楼 Ryan 2018-06-09

    compare部分实现的代码如下:

    这里 _transform_compare 为一种前向神经网络的实现。

  • 5楼 Ryan 2018-06-11

    SNLI数据加载的部分代码: 

    • wangsy 2018-06-11
      老师,model_data_type是什么 有几个取值
    • Ryan 2018-06-12
      回复 wangsy: train test dev
    • Ryan 2018-06-12
      回复 wangsy: 可以按照自己的思路实现,不一定要复现我的代码。看关键部分怎么实现,然后自己把细枝末节补上就行
  • 6楼 wangsy 2018-06-11

    老师 数据输入还是不怎么会读取 参照几个例子写的,希望老师指导下

    def _read_data(filename):
        with  tf.gfile.GFile(filename, "r") as f:
            if Py3:
                model_data = f.read()
            else:
                model_data = f.read().decode("utf-8")
        data =List()
        for line in model_data.split('\n'):
            if line =="":
                continue
            if lowercase:
                line = line.lower()
            line_data = json.loads(line)
            label = line_data['gold_label']
            if label == '-':
                continue
            if tokenizer is not None:
                tokens1 = tokenizer.tokenize(line_data['sentences1'])
                tokens2 = tokenizer.tokenize(line_data['sentences2'])
            else:
                tokens1 = nltk.Tree.fromstring(line_data['sentences1_prase']).leaves()
                tokens1 = nltk.Tree.fromstring(line_data['sentences2_prase']).leaves()
            data.append(tokens1,tokens2, label)
        return data
    def ptb_raw_data(data_path=None):
        train_path = os.path.join(data_path, "snli_1.0_train.jsonl")
        volid_path = os.path.join(data_path, "snli_1.0_dev.jsonl")
        test_path = os.path.join(data_path, "snli_1.0_test.jsonl")
        train_data = _read_data(train_path)
        volid_data = _read_data(volid_path)
        test_data = _read_data(test_path)

     

  • 7楼 wangsy 2018-06-11

    更新下代码,求所有句子的最大长度不知道怎么写

    def _read_data(filename, model):
        with  tf.gfile.GFile(filename, "r") as f:
            if Py3:
                model_data = f.read()
            else:
                model_data = f.read().decode("utf-8")
        data =List()
        for line in model_data.split('\n'):
            if line =="":
                continue
            if lowercase:
                line = line.lower()
            line_data = json.loads(line)
            label = line_data['gold_label']
            if label == '-':
                continue
            if tokenizer is not None:
                tokens1 = tokenizer.tokenize(line_data['sentences1'])
                tokens2 = tokenizer.tokenize(line_data['sentences2'])
            else:
                tokens1 = nltk.Tree.fromstring(line_data['sentences1_prase']).leaves()
                tokens1 = nltk.Tree.fromstring(line_data['sentences2_prase']).leaves()
                tokens1_vector = sen_to_vector(tokens1, model)
                tokens2_vector = sen_to_vector(tokens2, model)
                label_vector = label_to wector(label)
            data.append(tokens1_vector,tokens2_vector, label_vector )
        return data
    
    def label_to wector(label):
        if label == 'neutral'
            return '001'
        if label == 'neutral'
            return '010'
        if label == 'neutral'
            return '100'
        
        
           
    def sen_to_vector(token, model):
         data = token.split(" ")
            for word in data if word in model
                return model[word]
    def ptb_raw_data(data_path=None, model):
        train_path = os.path.join(data_path, "snli_1.0_train.jsonl")
        volid_path = os.path.join(data_path, "snli_1.0_dev.jsonl")
        test_path = os.path.join(data_path, "snli_1.0_test.jsonl")
        train_data = _read_data(train_path, model)
        volid_data = _read_data(volid_path, model)
        test_data = _read_data(test_path, model)
        
        return train_data, volid_data , test_data
    
    
    def main():
        model = gensim.models.Word2Vec.load('model/'+glove.6B) 
        model_data = reader.ptb_raw_data(data_path, model)
        train_data, volid_data , test_data = model_data

     

发表回复
你还没有登录,请先 登录或 注册!