2011-10-26

一般化線型モデルの復習

統計

仮定

確率分布

確率変数 $Y_i$ が互いに独立に canonical form の指数型分布族、
\[
f_{Y_i}(y_i)=\exp\left\{ [y_i \theta_i - b(\theta_i)] / \phi^2 - c(y_i, \phi) \right\},
\] に従う事を仮定する ($Y_i \overset{\rm{i.i.d.}}{\sim} f_{Y_i}(y_i)$)。
また、$b(\theta_i)$ および $\phi^2$ は、仮定した確率分布によって "決まっている" 事に注意する必要がある (例えば、ポアソン分布を仮定した場合、$\phi^2=1$)。

リンク関数

確率変数 $Y_i$ の期待値を $E[Y_i]=\mu_i$ と書く。
この期待値に対して、既知のリンク関数、$g(\cdot)$、を用いて
\[
g(\mu_i)={\bf x}_i^{\mathrm t}\boldsymbol{\beta},
\] を仮定する。
${\bf x}_i$ は各標本のデザインベクトル、$\boldsymbol{\beta}$ はパラメータベクトルである。

対数尤度関数

対数尤度関数は、
\[
l=\sum_{i=1}^{n} [y_i \theta_i - b(\theta_i)] / \phi^2 - \sum_{i=1}^{n} c(y_i, \phi),
\] と書くことができる。

$\theta_i$ に対する尤度方程式から得られる性質

確率変数 $Y_i$ の期待値

$\theta_i$ をパラメータと見なし、スコア関数の期待値が $0$ になるという性質、
\[
\mathrm{E}\left[ \left\{ Y_i - \frac{\partial b(\theta_i)}{\partial \theta_i} \right\} \bigg/ \phi^2 \right] = 0
\] から、
\[
\mathrm{E}[Y_i]=\mu_i=\frac{\partial b(\theta_i)}{\partial \theta_i},
\] が分かる。

確率変数 $Y_i$ の分散

Fisher 情報行列の性質、
\[
\mathrm{Var}\left[ \left\{ Y_i - \frac{\partial b(\theta_i)}{\partial \theta_i} \right\} \bigg/ \phi^2 \right] = -\mathrm{E}\left[ -\frac{1}{\phi^2} \frac{\partial^2 b(\theta_i)}{\partial \theta_i^2} \right],
\] から、
\[
\mathrm{Var}(Y_i)=
\phi^2 \frac{\partial^2 b(\theta_i)}{\partial \theta_i^2} \equiv \phi^2 v(\mu_i),
\] であり、$v(\mu_i)$ は分散関数とも呼ばれている。

$\boldsymbol{\beta}$ の最尤推定

$\boldsymbol{\beta}$ に対するスコア関数は、
\[
\begin{align*}
\frac{\partial l}{\partial \boldsymbol{\beta}} &=
\frac{1}{\phi^2} \sum_{i=1}^{n} \left\{ y_i \frac{\partial \theta_i}{\partial \boldsymbol{\beta}} - \frac{\partial b(\theta_i)}{\partial \theta_i}\frac{\partial \theta_i}{\partial \boldsymbol{\beta}} \right\} \\ &=
\frac{1}{\phi^2} \sum_{i=1}^{n} (y_i - \mu_i) \frac{\partial \theta_i}{\partial \mu_i} \frac{\partial \mu_i}{\partial \bf{\beta}} \\ &=
\frac{1}{\phi^2} \sum_{i=1}^{n} (y_i - \mu_i) \frac{1}{v(\mu_i)} \left(\frac{\partial g(\mu_i)}{\partial \mu_i}\right)^{-1}{\bf x}_i \\ &=
\frac{1}{\phi^2} \sum_{i=1}^{n} (y_i - \mu_i) w_i \delta_i {\bf x}_i
\end{align*}
\] である。ただし、
\[
\delta_i = \frac{\partial g(\mu_i)}{\partial \mu_i}
\] であり、
\[
\frac{\partial \theta_i}{\partial \mu_i}=\left( \frac{\partial \mu_i}{\partial \theta_i}\right)^{-1}=\frac{1}{v(\mu_i)}
\] \[
\frac{\partial \mu_i}{\partial \boldsymbol{\beta}}= \frac{\partial \mu_i}{\partial g(\mu_i)}\frac{\partial g(\mu_i)}{\partial \boldsymbol{\beta}}= \left(\frac{\partial g(\mu_i)}{\partial \mu_i}\right)^{-1}\frac{\partial {\bf x}_i^{\mathrm t}\boldsymbol{\beta}}{\partial \boldsymbol{\beta}}= \left(\frac{\partial g(\mu_i)}{\partial \mu_i}\right)^{-1}{\bf x}_i
\] を用いた。
行列を用いて書くと、
\[
\frac{\partial l}{\partial \boldsymbol{\beta}}= \frac{1}{\phi^2}\bf{X}^{\rm{t}}\bf{W}\boldsymbol{\Delta}(\bf{y} - \boldsymbol{\mu})
\] であり、
\[
\frac{\partial l}{\partial \boldsymbol{\beta}} \bigg|_{\boldsymbol{\beta}=\hat{\boldsymbol{\beta}}} = \bf{0}
\] を満たす解、$\hat{\boldsymbol{\beta}}$、が最尤推定量である。
解析的に解けない場合の方がたぶん多い。

対数尤度関数の二階偏導関数は、
\[
\frac{\partial^2 l}{\partial \boldsymbol{\beta} \partial \boldsymbol{\beta}^{\rm{t}}} = -\frac{1}{\phi^2} \bf{X}^{\rm{t}} \bf{W} \boldsymbol{\Delta} \frac{\partial}{\partial \boldsymbol{\beta}^{\rm{t}}} (\bf{y} - {\boldsymbol \mu}) + \rm{\frac{1}{\phi^2}} \bf{X}^{\rm{t}} \frac{\partial \bf{W} \boldsymbol{\Delta}}{\partial \boldsymbol{\beta}^{\rm{t}}} (\bf{y} - \boldsymbol{\mu})
\] である。
これについて期待値をとり、マイナスを付けると
\[
I(\boldsymbol{\beta}) = -\mathrm{E}\left[ \frac{\partial^2 l}{\partial \boldsymbol{\beta} \partial \boldsymbol{\beta}^{\rm{t}}} \right]= \frac{1}{\phi^2} \bf{X}^{\rm{t}}\bf{W}\boldsymbol{\Delta}\frac{\partial \boldsymbol{\mu}}{\partial \boldsymbol{\beta}^{\rm{t}}} + \bf{0}= \rm{\frac{1}{\phi^2}} \bf{X}^{\rm{t}}{\bf W}\boldsymbol{\Delta}{\boldsymbol \Delta}^{-1}{\bf X} = \rm{\frac{1}{\phi^2}} \bf{X}^{\rm{t}}\bf{W}\bf{X}
\] が得られる。

追記

間違ってたら怖いな、特にベクトル二階偏微分のあたりを見直そう。
あとは、quasi-likelihood についても復習しようかな。

2011-10-09

set end; データステップで最後のオブザベーションのみ出力する方法

SAS

これは覚えておくとお得かもしれません。
set data; by variable; if last.variable then output; の仲間ですね。

data sample;
input x y z @@;
cards;
1 2 3  4 5 6  7 8 9
;
data sample_final;
  retain xx xy xz yy yz zz 0;
  set sample end = final;
  xx = xx + x * x;
  xy = xy + x * y;
  xz = xz + x * z;
  yy = yy + y * y;
  yz = yz + y * z;
  zz = zz + z * z;
  if final then output;
proc print; run;

2011-09-17

Reference class に roxygen2 用のコメントを入れる方法

先日 Reference class に roxygen 用のコメントを入れる方法という記事を書いたのですが、実は全然ドキュメントを作れていなかったことに今更気づいたので、せっかくなので roxygen2 用の場合を作ってみました。
基本的には変わっていませんが、@section で fields 等の部分を記述していたり、[http://www.mail-archive.com/roxygen-devel@lists.r-forge.r-project.org/msg00091.html:title=@aliases] の問題 (Roxygen-devel) があったり、個人的には解決しておきたい問題がありました。

Reference class 用サンプルパッケージ (roxygen2 用)

パッケージの Rd ファイル用の roxygen コメント

パッケージの Rd ファイルは以下の様に書くことにしました。

#' testRefclassPkg: A test package for reference class documents using 'roxygen2'
#'
#' A Test Package for Reference Class Documents Using 'roxygen2'
#'
#' This package is an example of how to document reference class methods.
#'
#' @name testRefclassPkg-package
#' @aliases testRefclassPkg
#' @rdname testRefclassPkg-package
#' @docType package
#' @keywords documentation
#'
#' @author
#' Author: Triad sou.
#' Maintainer: Triad sou. \email{triadsou@@gmail.com}
#' @seealso \code{\link[methods:ReferenceClasses]{ReferenceClasses}}
#' @exportPattern '^[^\\.]'
NULL

ポイント

roxygen2 では roxygen と違い、上から順に空行で区切って、\title フィールド、\description フィールド、\details フィールドの三つを @ 無しで略記できます。
@name testRefclassPkg: \name フィールドを指定します。パッケージの roxygen2 コメントを書く場合、最後に続けて関数定義やクラス定義などを記入できないため、NULL や {} などを記入します。この場合、関数名やクラス名が存在しないため、\name フィールドを自動的に作成できずエラーが発生してしまいますが、自分で指定しておけば問題ありません。
@aliases testRefclassPkg: help("keyword") や ?keyword でヘルプを検索するための検索ワードを指定しています。いちいち -package を付けるのが面倒なので、個人的にパッケージ名を検索ワードとして登録しています。
@rdname testRefclassPkg-package: Rd ファイル名を指定します、好きな名前を指定して問題ありません。
@docType package: \docType フィールドにパッケージ用の Rd ファイルであることを指定します。
@keywords documentation: \keyword フィールドを指定します。
@author: \author フィールドを指定します。
@seealso: \seealso フィールドを指定します。ヘルプ間でリンクを張って、参考情報をたどれるようにするために指定しておきます。
@exportPattern '^[^\\.]': NAMESPACE ファイルを自動生成するために、すべてにマッチする directive を書いておきます。

Reference class の定義と Rd ファイル用の roxygen コメント

#' My test class
#'
#' My Test Class (description)
#'
#' \code{test_refclass} class is a test reference class that has three methods (details).
#'
#' @name test_refclass-class
#' @aliases test_refclass
#' @docType class
#'
#' @section Fields:
#' \describe{
#' \item{\code{f1}: }{numeric number of a test field.} 
#' \item{\code{f2}: }{numeric number of a test field.} 
#' }
#' @section Contains:
#' NULL
#' @section Methods:
#' \describe{
#' \item{\code{m1(arg1 = 1)}: }{\code{m1} method set a value to \code{f1}.}
#' \item{\code{m2(arg1)}: }{\code{m2} method return \code{f1 + arg1 * 2}.}
#' \item{\code{m3(arg1)}: }{\code{m3} method return \code{f1 + arg1 * 3}.}
#' }
#' @keywords documentation
#' @family test_refclass
test_refclass <- setRefClass(

  Class = "test_refclass",
  
  fields = c("f1", "f2"),

  methods = list(

    m1 = function(arg1 = 1) {
      f1 <<- arg1
    },

    m1 = function(arg1 = 1, arg2) {
      f1 <<- arg1 + arg2
    },

    m2 = function(arg1) {
      if (class(f1) == "uninitializedField") f1 <<- 2
      f1 + arg1 * 2
    },

    m3 = function(arg1) {
      if (class(f1) == "uninitializedField") f1 <<- 3
      f1 + arg1 * 3
    }

  )
)

ポイント

@name test_refclass-class: \name フィールドを指定します。@aliases の問題で、@aliases test_refclass-class と書いてしまうと、Rd ファイルの \alias フィールドがダブルクォートで囲まれてしまい、Rcmd check から警告をいただくことになります。ですので、@name と @aliases を逆に書いて、この問題に対処しています。
@aliases test_refclass: \alias フィールドを指定します。
@docType class: クラス用の Rd ファイルであることを指定します。
@section Fields:: Bioconductor にある S4 クラスのドキュメントを参考にすると、\section{Slots} フィールドや \section{Methods} フィールドが用いられていることが分かります。それに合わせて、Reference class では \section{Slots} の代わりに \section{Fields} を作成してみました。ちなみに、@section XXX: で任意のセクションをヘルプファイルに入れておくことができます。
@section Contains:: 不要かもしれませんが、個人的には \section{Contains} フィールドを作って、継承に関する情報も併記してあった方が良いと思います。
@section Methods:: \section{Contains} フィールドを指定します。
@keywords documentation: \keyword フィールドを指定します。
@family test_refclass: roxygen2 の特殊キーワードで、@family グループ名 の形式で記入しておくと、同じグループ名を指定した Rd ファイルに対して \seealso フィールドに相互リンクが自動的に生成されます。

メソッドの Rd ファイル用の roxygen コメント

#' The \code{m1} method of \code{test_refclass} class
#'
#' \code{m1} method
#'
#' \code{m1} method set a value to \code{f1}.
#'
#' @name m1,test_refclass-method
#' @aliases m1
#' @rdname test_refclass-m1
#' @docType methods
#'
#' @param arg1 numeric number of a test argument
#' @usage \S4method{m1}{test_refclass}(arg1 = 1)
#' @examples
#' classObj <- test_refclass$new()
#' print(classObj$m1(10))
#' @family test_refclass
NULL

#' The \code{m2} method of \code{test_refclass} class
#'
#' \code{m2} method
#'
#' \code{m2} method return \code{f1 + arg1 * 2}.
#'
#' @name m2,test_refclass-method
#' @aliases m2
#' @rdname test_refclass-m2
#' @docType methods
#' 
#' @param arg1 numeric number of a test argument
#' @usage \S4method{m2}{test_refclass}(arg1)
#' @examples
#' classObj <- test_refclass$new()
#' print(classObj$m2(10))
#' @family test_refclass
NULL

#' The \code{m3} method of \code{test_refclass} class
#'
#' \code{m3} method
#'
#' \code{m3} method return \code{f1 + arg1 * 3}.
#'
#' @name m3,test_refclass-method
#' @aliases m3
#' @rdname test_refclass-m3
#' @docType methods
#'
#' @param arg1 numeric number of a test argument
#' @usage \S4method{m3}{test_refclass}(arg1)
#' @examples
#' classObj <- test_refclass$new()
#' print(classObj$m3(10))
#' @family test_refclass
NULL

ポイント

@name m1,test_refclass-method: \name フィールドを指定します。@aliases の問題については同様のため。
@aliases m1: \alias フィールドを指定します。
@rdname test_refclass-m1: Rd ファイル名を指定します、好きな名前を指定して問題ありません。
@docType methods: メソッド用の Rd ファイルであることを指定します。
@param: \arguments フィールドを指定します。
@usage: \usage フィールドを指定します。\S4method{generic}{signature_list}(argument_list) を流用しています。
@examples: \examples フィールドを指定します。@example /パッケージホームからの相対パス で書くと、ファイルを指定して読み込むことも可能です。@example /inst/examples/xxxx.R を用いている例が実際にいくつかありました。
@family test_refclass: 相互リンクを指定しておきます。この例ではクラスとメソッド全部に相互リンクを張っています。

表示例

上記のスクリプトを *.R ファイルとして保存し、DESCRIPTION ファイルなど他の必要なファイルと一緒に適切にディレクトリに配置し、

require(roxygen2)
roxygenize(path, copy.package = FALSE)

の様に roxygenize() 関数を適用すると、Rd ファイルが生成されますので、これらをまとめてパッケージ化すると完成です。

以下に、パッケージ化してインストール後に生成されたドキュメントを示しておきます。

roxygen 版の記事について

roxygen 版の記事では @slot 云々の話を書いていたのですが、結局書いても無意味だった事に気づいていなかっただけで、roxygen2 で書いた方が良いということが分かりました！
ちなみに roxygen では @section も利用できません。

2011-09-16

箱ひげ図に横棒を追加する

R ggplot2

以下のようにすると簡単に横棒付きの箱ヒゲ図を作成できます。

require("ggplot2")

df <- data.frame(
  x = rep(1:5, each = 50),
  y = rt(2500, 3)
)

ggplot(df, aes(x = factor(x), y = y)) +
  stat_boxplot(geom = "errorbar", stat_params = list(width = 0.5), geom_params = list()) +
  geom_boxplot() +
  theme_gray(25, "serif")

ggsave("myplot.png", height = 7, width = 7)

~~この方法には一つ難点があって、stat_boxplot(geom = "errorbar", width = 0.25) のように書いても横棒の長さを変更できませんでした。~~
~~理由がちょっと分からないので、情報をもらえると助かります。~~
kohske さんより頂いた情報で解決できました。
stat_params = list() と geom_params = list() は知っておくとお得かもしれません。
stat_qq() の dparams = list() みたいに投げたいところにパラメータを与えられるイメージですね。

`stat_params = list(width = 0.75)`: default

`stat_params = list(width = 0.50)`

`stat_params = list(width = 0.25)`

自分でスクリプトを書いてしまえば、もっと細かく設定できますね。
参考：How to add horizontal lines to ggplot2 boxplot?

追記 (2015/12/11)

1.0.1の次のバージョンでは

ggplot(df, aes(x = factor(x), y = y)) +
  stat_boxplot(geom = "errorbar", width = 0.5) +
  geom_boxplot() +
  theme_gray(25, "serif")

と書けるようになるみたい。

2011-09-15

Rcmd Rd2dvi が失敗する

R TeX

Rcmd Rd2dvi で PDF マニュアルを作成することがあるのですが、いつの間にか失敗してしまうようになっていて困っていました。
しかし、単に Rcmd Rd2dvi の実行には MikTeX が必要 (推奨) だったようで、インストールすると問題無く PDF マニュアルを作れるようになりました。

以下のようなログが出てしまう場合は MikTeX を入れてみると良いかもしれません。

Hmm ... looks like a package
sed.exe: not found
sed.exe: not found
kpsewhich.exe: not found
cat: not found
Warning: running command '"texi2dvi.exe"  --pdf "Rd2.tex" ' had status 1
Saving output to 'test-manual.pdf' ...
Done
You may want to clean up by 'rm -rf TEMP/RtmphKXMi6/Rd2pdf678e7c97'

sed.exe: not found みたいなエラーだったので、パス関係かと思って色々いいじっていたのにこんな簡単に解決できるとは・・・

2011-09-10

コピペでデータ読み込み

R で Excel などに入力されたちょっとしたデータを読み込む時、データ整形に結構手間がかかってめんどくさいと思っていましたが、
How to read space delimited data into a data frame from your script/document file? (Stack Overflow) ではコピペだけで実行できる方法が紹介されていました。

dat <- read.table(textConnection("person1    12    15
person2    15    18
person3    20    14"), stringsAsFactors=FALSE)
dat

> dat
       V1 V2 V3
1 person1 12 15
2 person2 15 18
3 person3 20 14

普通に `header` とかが使えた... (2011/09/16) 関数化

~~少し編集し、変数名入りのものでも使えるようにしてみました。~~

dat1 <- read.table(textConnection("
person1    12    15
person2    15    18
person3    20    14
"), stringsAsFactors = FALSE, header = FALSE)
dat1

dat2 <- read.table(textConnection("
name       x1    x2
person1    12    15
person2    15    18
person3    20    14
"), stringsAsFactors = FALSE, header = TRUE)
dat2

> dat1
       V1 V2 V3
1 person1 12 15
2 person2 15 18
3 person3 20 14

> dat2
     name x1 x2
1 person1 12 15
2 person2 15 18
3 person3 20 14

調べ切れていないだけで R base にこの機能入ってたりするのかな？

追記

クリップボードから直接読み込む方法です。
id:dichika さん、ありがとうございます。

read.delim("clipboard") #windonws
read.delim(pipe("pbpaste")) #Mac

Rcmdr で使用されているスクリプト。

read.table(
  "clipboard", header = TRUE, sep = "",
  na.strings = "NA", dec=".", strip.white = TRUE
)

2011-08-31

Error in including pdf figures with \includegraphics; no BoundingBox!

TeX

最近 TeX で文章を書いていて、pdf で作成したグラフを読み込もうとしたところ、bb ファイルをちゃんと作成してたはずが、

LaTeX Error: Cannot determine size of graphic in fig2.5.pdf (no BoundingBox).

というエラーに見舞われました。

いろいろと試行錯誤したのですが、結局ファイル名に "." が含まれていると正常に処理できない事が原因だったようです。今後は気をつけよう。

\documentclass{jsarticle}

\usepackage[dvipdfmx]{graphicx} 

\begin{document}

\begin{figure}[!htb]
\begin{center}
% \includegraphics[clip,width=0.95\textwidth]{fig2.5.pdf}
\includegraphics[clip,width=0.95\textwidth]{fig25.pdf}
\end{center}
\caption{
If "." is included in the filename, an error will occur;
{\bf LaTeX Error: Cannot determine size of graphic in fig2.5.pdf (no BoundingBox).}
}
\end{figure}

\end{document}

おまけ

require(ggplot2)

x <- seq(0, 7, length.out = 50)
y <- -(x - 3.5)^2 + rnorm(50, sd = 3)
fig25 <- data.frame(x=x, y=y)

kmg2_theme_bw <- function(base_size = 26, base_family = "Times") {

  mod <- list(
    axis.title.x = theme_text(family = base_family, size = base_size, vjust = -0.5),
    axis.title.y = theme_text(family = base_family, size = base_size, angle = 90,
      vjust = 0.25),
    axis.ticks.margin = unit(0.2, "cm"),
		legend.background = theme_blank(),
    legend.key = theme_rect(colour = "transparent", fill = "transparent",
      size = 0.5, linetype = 0),
    legend.text = theme_text(family = base_family, size = base_size * 0.6),
    legend.title = theme_text(family = base_family, face = "bold",
			size = base_size * 0.6, hjust = 0),
    legend.position = c(0.95, 0.95), legend.justification = c(1, 1),
    plot.title = theme_text(family = base_family, face = "bold",
			size = base_size * 0.9, vjust = 1.5),
    plot.margin = unit(c(1, 1, 1, 1.25), "lines")
  )
  modifyList(theme_bw(base_size = base_size, base_family = base_family), mod)

}

pdf("fig25.pdf")

p1 <- ggplot(fig25, aes(x = x, y = y)) +
  geom_point(size = 3) +
  stat_smooth(aes(linetype = "Lowess smooth"), span = 2/3, degree = 1,
    family = "symmetric", size = 1, fill = "blue", alpha = 0.1) + 
  labs(x = expression(italic(x)), y = expression(italic(y)), linetype = "") +
  kmg2_theme_bw() + 
  opts(legend.position = c(0.19, 0.99), legend.justification = c(0, 1))
print(p1)

dev.off()

embedFonts("fig25.pdf")

仮定

確率分布

リンク関数

対数尤度関数

$\theta_i$ に対する尤度方程式から得られる性質

確率変数 $Y_i$ の期待値

確率変数 $Y_i$ の分散

$\boldsymbol{\beta}$ の最尤推定

追記

Reference class 用サンプルパッケージ (roxygen2 用)

パッケージの Rd ファイル用の roxygen コメント

Reference class の定義と Rd ファイル用の roxygen コメント

メソッドの Rd ファイル用の roxygen コメント

表示例

roxygen 版の記事について

stat_params = list(width = 0.75): default

stat_params = list(width = 0.50)

stat_params = list(width = 0.25)

追記 (2015/12/11)

普通に header とかが使えた... (2011/09/16) 関数化

追記

おまけ

`stat_params = list(width = 0.75)`: default

`stat_params = list(width = 0.50)`

`stat_params = list(width = 0.25)`

普通に `header` とかが使えた... (2011/09/16) 関数化