回复审稿人意见的笔记

题目:我原来的题目是“修改说明”,老师改成了“稿件修稿说明”

排版:

(1) 专家原来提出的问题用粗体字标出,每一组问题-答复之间不空行。(我本来是蓝色字体标出,且每一天意见的回复之后都会空行。)

(2) 回复内容不加特殊格式。(我原来有加粗强调的,可能这样有种给人大吼大叫的印象,显得不礼貌)

(3) 行距用固定值。我原来用的是单倍行距,老师修改后用了固定值:18磅。可以比较下:

单倍行距排版
固定值:18磅排版

我觉得有时间应该阅读下这个

简体中文文本排版指南 https://zhuanlan.zhihu.com/p/49729668

https://www.zhihu.com/question/19621096

不过就我现在“没时间”的情况看,记住尝试使用固定值18磅作为行距也是个选择。

措辞:哪里进行了相应修改,就直接写“根据专家意见,对……进行了……”,不要啰里啰唆,也不要加粗或者滥用括号。

行文:原《修改说明》口语化较严重。修改说明是正式文件,理应使用正式或半正式用语。

其他:论文修改稿中确实进行修改/补充的内容,不要在修改说明中啰里啰唆再重复一遍。

Pytorch模型训练翻车记录

背景

在Google Colab上进行压缩采样的图像重建模型的训练。已经有了训练好的压缩率是0.20的模型(下文用r0.20之类的记号表示压缩率及其对应的模型)。现在想训练r0.25。觉得从头开始训练很费时间于是就想出了这么个办法

checkpoint = torch.load('ResCsNet-colab-5_2_1-r0.20_checkpoint.pth')
model = ResCsNet(N, int(0.20*N))
model.load_state_dict(checkpoint['state_dict'])

model = model_r20
model.encoder = model_r25.encoder

我的模型大致分为编码器和解码器两部分。编码器可以直接通过.encoder访问。上面这段代码的意思是从r0.20中载入所有训练参数,然后把编码器部分换成r0.25的编码器。刚开始的时候r0.25编码器参数是随机初始化的。

过程

一开始训练并不能察觉到什么异样(如图)

因为Colab每隔12h就会重置虚拟机,因此过了12h后接着训练必须重新载入之前保存的训练参数。我直接保存的是model的参数:

state = {
'tfx_steps': tfx_steps,
'tfx_epochs_done': tfx_epochs_done,
'state_dict': model.state_dict(),
'optimizer' : optimizer.state_dict(),
'lr_scheduler': lr_scheduler.state_dict()
}
torch.save(state, ckpt_name)

问题是直接载入这保存的参数会出问题!

checkpoint = torch.load(fname)
tfx_steps = checkpoint['tfx_steps']
print(f"tfx_steps is {tfx_steps}")
tfx_epochs_done = checkpoint['tfx_epochs_done']
print(f"tfx_epochs_done is {tfx_epochs_done}")
model = ResCsNet(N, int(0,25*N))
model.load_state_dict(checkpoint['state_dict'])
model.train()
model.cuda()
突然上跳的loss曲线的形状甚是像训练刚刚开始的样子。合着从checkpoint里载入参数的就像没训练过?

一个很显而易见的事情是:从checkpoint文件里重新加载的模型(包块optimizer也加载了),其参数居然向没训练过似的(如上图突然上跳的loss曲线)。虽然并不明白为什么但是很显然着跟之前“拼接”训练模型这个动作有关。

正确的做法的探讨

正确的做法似乎(我没有验证)是:(正如 https://pytorch.org/tutorials/beginner/saving_loading_models.html#warmstarting-model-using-parameters-from-a-different-model 介绍的那样)

modelB = TheModelBClass(*args, **kwargs) modelB.load_state_dict(torch.load(PATH), strict=False)

也就是

# 一开始的时候就应该这么做吧
model_r25 = ResCsNet(N, int(0.25*N))
model_r20 = ResCsNet(N, int(0.20*N))
model_r20_state_dict = torch.load('ResCsNet-colab-5_2_1-r0.20_checkpoint.pth')['state_dict']
model = ResCsNet(N, int(0.25*N))
model.load_state_dict(model_r20_state_dict, strict=False)
model.encoder.load_state_dict( model_r25.encoder.state_dict() )

tutorial声称strict=False参数能够允许不匹配的键名,当然改键名(https://stackoverflow.com/questions/16475384/rename-a-dictionary-key)也是可以的。

lambda in Python: is it returning multiple values?

To begin with let’s just have look at the old school of lambda in Python.

As we know, lambda x: return x**2 is exactly equivalent to

def squared(x): return x**2

Now look at this

>>> f1 = lambda x,y,z: x+1, y+1, z+1
>>> print(f1(1,1,1))

What will you get on the screen? A tuple of (2, 2, 2)? No. You get an error instead.

Traceback (most recent call last):
  File "<pyshell#7>", line 1, in 
    print(f1(1,1,1))
TypeError: 'tuple' object is not callable

Now that the “returning part” of lambda is covering the contents before comma only, let’s explicitly add the brackets:

>>> f1 = lambda x,y,z: (x+1, y+1, z+1)
>>> print(f1(1,1,1))
(2, 2, 2)

Things begin to become clearer now, but what is the ‘tuple’ in the wrong example? It’s f1. Have a look at this:

>>> y = 10
>>> z = 20
>>> f1 = lambda x,y,z: x+1, y+1, z+1
>>> type(f1)<class 'tuple'>>>> print(f1)(<function <lambda> at 0x7feb222116a8>, 11, 21)

This time the lambda becomes something like def _func(x, y, z): return x+1 – only returns one value and now takes useless y and z as its arguments. You can confirm by:

>>> print(f1[0](2,-10,-20))
3

All done!

Quick note on Python syntax magics

As noted in Python 3 documention, behaviors of +, -, *, etc. can be redefined.

Specially, I would like to take notes on some special yet common methods.

__repr__ method

This method is called when you apply print on the instance. Could be useful when debugging class related problems.

__enter__, __exit__ methods

Those two methods are key components of a context manager. Refer to https://jeffknupp.com/blog/2016/03/07/python-with-context-managers/ to get a sense on how to manage the fragile resouces with context manager.


所谓核函数和核方法

不过是把向量的内积\(<\mathbf{x},\mathbf{z}>\)映射为\(<\phi(\mathbf{x})^T,\phi(\mathbf{z})>\)的把戏罢了。这样的算子\(K(\mathbf{x},\mathbf{z})=<\phi(\mathbf{x})^T,\phi(\mathbf{z})>\)便是核函数了。

但是为什么要这么做呢?还是因为那个假设:低维空间中不能线性分割的点集,通过转化为高维空间中的点集时,很有可能变为线性可分的。(SVM讲过这个)

需要注意的是,和神经网络等不同的是,核方法在预测时也需要训练数据(这种类型称作基于实例的学习或基于内存的学习,kNN方法属于这种类型)。

核函数的存在条件被称为Mercer条件