使用Rstudio下载GEO芯片数据
2024-06-02 00:00:39  阅读数 3777

学习GEO芯片数据下载时踩了各种坑。记录如下:
跟从老师讲解,尝试使用GEOquery下载:

library('GEOquery')
library(dplyr)
library(tidyverse)
gset <- getGEO(GEO='GSE87211', destdir=".", getGPL = F)
### destdir存储目录位置,getGPL=F为拒绝下载注释文件

报错。下载龟速,且报错 Timeout of 60 seconds was reached

Found 3 file(s)
GSE12417-GPL570_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE12nnn/GSE12417/matrix/GSE12417-GPL570_series_matrix.txt.gz'
Content type 'application/x-gzip' length 23572020 bytes (22.5 MB)
========================
> options(timeout=60)
> gset <-  getGEO(GEO='GSE87211', destdir=".",getGPL = F)
Found 1 file(s)
GSE87211_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE87nnn/GSE87211/matrix/GSE87211_series_matrix.txt.gz'
Content type 'application/x-gzip' length 35235899 bytes (33.6 MB)

downloaded 688 KB

Error in download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",  : 
  download from 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE87nnn/GSE87211/matrix/GSE87211_series_matrix.txt.gz' failed
In addition: Warning messages:
1: In download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",  :
  downloaded length 704512 != reported length 35235899
2: In download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",  :
  URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE87nnn/GSE87211/matrix/GSE87211_series_matrix.txt.gz': Timeout of 60 seconds was reached

解决Timeout of 60 seconds was reached(我的Rstudio server原先设定等待时间仅为60s)

#查看timout时间
> getOption('timeout')
[1] 60
#设定timeout时间
> options(timeout=100000)
##确认一下
> getOption('timeout')
[1] 1e+05

再次运行GEOquery的getGEO。代码顺利运行,但因某些原因仍下载龟速。


image.png

有人提出解决方案:

options( 'download.file.method.GEOquery' = 'libcurl' )
## libcurl LibCurl是免费的URL传输库

仅有一点点改善,依然龟速。
求助百度,尝试使用geoChina代码。此代码基于AnnoProbe包。先安装AnnoProbe。

> install.packages('AnnoProbe')
> library(AnnoProbe)
#更新镜像库
> devtools::install_git("https://gitee.com/jmzeng/GEOmirror")
#使用中国镜像下载GEO数据
> gset <- AnnoProbe::geoChina(gse='GSE87211', mirror = 'tencent', destdir = '.')
#此处mirror仅有企鹅源

下载成功。

Found 1 file(s)
GSE87211_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE87nnn/GSE87211/matrix/GSE87211_series_matrix.txt.gz'
Content type 'application/x-gzip' length 35235899 bytes (33.6 MB)
==
> gset <- AnnoProbe::geoChina(gse='GSE87211', mirror = 'tencent', destdir = '.')
trying URL 'http://49.235.27.111/GEOmirror/GSE87nnn/GSE87211_eSet.Rdata'
Content type 'application/octet-stream' length 31922908 bytes (30.4 MB)
==================================================
downloaded 30.4 MB

file downloaded in .
you can also use getGEO from GEOquery, by 
getGEO('GSE87211', destdir=".", AnnotGPL = F, getGPL = F)
> 
image.png

经比对,与getGEO代码下载所得数据没有差异。