{"id":115,"date":"2016-08-30T23:02:44","date_gmt":"2016-08-30T15:02:44","guid":{"rendered":"http:\/\/ayonel.me\/?p=115"},"modified":"2016-09-13T17:44:32","modified_gmt":"2016-09-13T09:44:32","slug":"scrapy_for_github","status":"publish","type":"post","link":"https:\/\/ayonel.malash.net\/index.php\/2016\/08\/30\/scrapy_for_github\/","title":{"rendered":"scrapy\u722c\u53d6GitHub API\u6559\u7a0b"},"content":{"rendered":"<p><strong>GitHub API<\/strong>\u5176\u5b9e\u662f\u4e00\u5ea7\u5b9d\u85cf\uff0c\u5b83\u62e5\u6709\u7740\u6d77\u91cf\u9879\u76ee\u4ee5\u53ca\u5f00\u53d1\u8005\u7684\u5404\u7c7b\u4fe1\u606f\uff0c\u53ef\u4ee5\u4f5c\u4e3a\u793e\u4ea4\u7f16\u7a0b\u4ee5\u53ca\u7ecf\u9a8c\u8f6f\u4ef6\u5de5\u7a0b\u8bfe\u9898\u7684\u6570\u636e\u8f7d\u4f53\u3002<\/p>\n<p>\u672c\u6b21\u6559\u7a0b\u8f83\u5927\u5bb6\u5982\u4f55\u4f7f\u7528scrapy\u6765\u722c\u53d6GitHub API\uff0c\u6293\u53d6\u6211\u4eec\u6240\u9700\u8981\u7684\u7279\u5b9a\u4fe1\u606f\u3002GitHub API \u662fGitHub\u57fa\u4e8e<strong>OAuth2<\/strong>\u534f\u8bae\u5f00\u653e\u51fa\u6765\u7684\u6570\u636e\u83b7\u53d6\u63a5\u53e3\uff0c\u6211\u4eec\u80fd\u591f\u5728GitHub API\u4e0a\u83b7\u53d6\u5404\u7c7b\u4fe1\u606f\uff0c\u6bd4\u5982\u4e00\u4e2a\u9879\u76ee\u7684<em>commit,issue,pull request<\/em>;\u4e00\u4e2a\u7528\u6237\u7684\u7c89\u4e1d\uff0c\u5173\u6ce8\uff0c\u63d0\u4ea4\u6d3b\u52a8\uff0c\u8bc4\u8bba\u7b49\u7b49\u3002GitHub API \u6709\u7740\u8be6\u7ec6\u7684<a href=\"https:\/\/api.github.com\">\u5b98\u65b9\u6587\u6863\u6559\u7a0b<\/a>,\u4e0a\u9762\u5404\u7c7b\u6570\u636e\u7684\u83b7\u53d6\u63a5\u53e3\u5730\u5740\uff0c\u4ee5\u53ca\u4e00\u4e9b\u8fc7\u6ee4\u53c2\u6570\u7b49\u3002\u672c\u6559\u7a0b\u5c06\u4ee5\u722c\u53d6<a href=\"http:\/\/github.com\/rails\/rails\">rails<\/a>\u7684issue\u4fe1\u606f\u4e3a\u4f8b\uff0c\u6559\u5927\u5bb6\u5982\u4f55\u4f7f\u7528<strong><em>scrapy<\/em><\/strong>\u6765\u722c\u53d6GitHub API\u3002<\/p>\n<p>\u518d\u5f00\u59cb\u722c\u53d6\u4e4b\u524d\uff0c\u6211\u4eec\u9700\u8981\u8fdb\u884c\u4e00\u4e9b\u51c6\u5907\u5de5\u4f5c\uff0c\u7531\u4e8e<strong>GitHub API<\/strong>\u91c7\u7528<strong>OAuth2<\/strong>\u8ba4\u8bc1\uff0c\u9700\u8981\u6211\u4eec\u63d0\u4f9b\u8ba4\u8bc1token\u3002\u5f53\u7136\u4e0d\u63d0\u4f9btoken\u53ef\u4ee5\u8fdb\u884c\u722c\u53d6\uff0c\u4f46\u662f\u722c\u53d6\u901f\u7387\u4f1a\u5927\u5927\u964d\u4f4e\u3002<br \/>\n<strong><br \/>\n<em>For requests using Basic Authentication or OAuth, you can make up to 5,000 requests per hour. For unauthenticated requests, the rate limit allows you to make up to 60 requests per hour. Unauthenticated requests are associated with your IP address, and not the user making requests. Note that the Search API has custom rate limit rules.<\/em><br \/>\n<\/strong><\/p>\n<p>\u4e0a\u9762\u663e\u793a\u5982\u679c\u6211\u4eec\u63d0\u4f9btoken\u53ef\u4ee5\u6bcf\u5c0f\u65f6\u8fdb\u884c<strong>5000<\/strong>\u6b21\u8bf7\u6c42\uff0c\u5bf9\u4e8e\u975e\u8ba4\u8bc1(\u4e0d\u63d0\u4f9btoken)\u4e00\u5c0f\u65f6\u53ea\u80fd\u63d0\u4f9b<strong>60<\/strong>\u6b21\u7684\u8bf7\u6c42\uff0c\u8d85\u51fa\u8bf7\u6c42\u901f\u7387\u9650\u5236\u540e\uff0c\u4f1a\u8fd4\u56de\u72b6\u6001\u7801<strong>403 forbidden<\/strong>\u3002\u6709\u4eba\u4f1a\u95ee\uff0c\u5373\u4f7f\u8ba4\u8bc1\u540e<strong>5000\u6b21\/\u5c0f\u65f6<\/strong>\u7684\u901f\u7387\u4e5f\u6709\u4e9b\u6162\u554a\uff0c\u5982\u679c\u8981\u66f4\u5feb\u5730\u8fdb\u884c\u722c\u53d6\uff0c\u90a3\u5c31\u591a\u6ce8\u518c\u51e0\u4e2aGitHub\u8d26\u53f7\uff0c\u540c\u65f6\u5229\u7528\u591a\u4e2a\u8d26\u53f7\u7684token\u8fdb\u884c\u722c\u53d6\u3002GitHub\u7684\u722c\u53d6\u9650\u5236\u9488\u5bf9\u7684\u662f\u6bcf\u4e2a\u7528\u6237\uff0c\u800c\u4e0d\u662f<strong>IP<\/strong>\uff0c\u4e5f\u5c31\u662f\u540c\u4e00\u53f0\u673a\u5668\uff0c\u53ea\u8981\u4f60\u4fdd\u8bc1\u6bcf\u4e2a\u8d26\u53f7\u6bcf\u5c0f\u65f6\u8fdb\u884c\u5c0f\u4e8e5000\u6b21\u7684\u722c\u53d6\uff0c\u4e5f\u662f\u5b8c\u5168\u6ca1\u6709\u95ee\u9898\u7684\u3002<\/p>\n<p>\u63a5\u4e0b\u6765\u6211\u4eec\u6765\u5c55\u793a\u5982\u4f55\u83b7\u53d6\u81ea\u5df1\u7684token\uff0c\u5176\u5b9e\u5168\u540d\u4e3apersonal access token\uff0c\u6bcf\u4e2a\u8d26\u53f7\u53ef\u4ee5\u62e5\u6709\u591a\u4e2atoken\uff0c\u6240\u4ee5\u4e0d\u5c0f\u5fc3\u54ea\u4e00\u5929\u5fd8\u4e86\u7684\u8bdd\uff0c\u91cd\u65b0\u751f\u6210\u4e00\u4e2a\u65b0\u7684\u5373\u53ef\u3002<\/p>\n<p>\u9996\u5148\u6211\u4eec\u9700\u8981\u767b\u5f55\u81ea\u5df1\u7684GitHub\u8d26\u53f7\u3002\u7136\u540e\u5728\u8bbe\u7f6e\u91cc\u9762\u6709\u4e00\u4e2a<em>personal access token<\/em>\u9009\u9879\uff1a<br \/>\n<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/ayonel.malash.net\/wp-content\/uploads\/2016\/08\/QQ\u622a\u56fe20160830092436-300x197.png\" alt=\"QQ\u622a\u56fe20160830092436\" width=\"300\" height=\"197\" class=\"alignnone size-medium wp-image-117\" srcset=\"https:\/\/ayonel.malash.net\/wp-content\/uploads\/2016\/08\/QQ\u622a\u56fe20160830092436-300x197.png 300w, https:\/\/ayonel.malash.net\/wp-content\/uploads\/2016\/08\/QQ\u622a\u56fe20160830092436-768x503.png 768w, https:\/\/ayonel.malash.net\/wp-content\/uploads\/2016\/08\/QQ\u622a\u56fe20160830092436.png 807w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><br \/>\n\u70b9\u8fdb\u53bb\uff0c\u518d\u70b9\u51fb<strong>Generate new token <\/strong>. <\/p>\n<p>\u9009\u62e9\u4f60\u60f3\u8ba9\u4f60\u7684token\u62e5\u6709\u7684\u6743\u9650\uff0c\u4e00\u822c\u9ed8\u8ba4\u5168\u9009\u5373\u53ef\u3002\u70b9\u51fb\u7eff\u8272\u7684Generate token\u6309\u94ae\u3002<br \/>\n<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/ayonel.malash.net\/wp-content\/uploads\/2016\/08\/QQ\u622a\u56fe20160830092627-300x104.png\" alt=\"QQ\u622a\u56fe20160830092627\" width=\"300\" height=\"104\" class=\"alignnone size-medium wp-image-119\" srcset=\"https:\/\/ayonel.malash.net\/wp-content\/uploads\/2016\/08\/QQ\u622a\u56fe20160830092627-300x104.png 300w, https:\/\/ayonel.malash.net\/wp-content\/uploads\/2016\/08\/QQ\u622a\u56fe20160830092627.png 749w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p>\u8fd9\u6837\u5c31\u751f\u6210\u6211\u4eec\u7684token\u4e86\u3002\u6ce8\u610f\u4e00\u5b9a\u8981\u5728\u8fd9\u65f6\u5019\u628atoken\u4fdd\u5b58\u4e0b\u6765\uff0c\u8fd9\u5c06\u662f\u4f60\u5728GitHub\u4e0a\u6700\u540e\u4e00\u6b21\u770b\u89c1\u4f60\u7684token\u3002\u4e00\u65e6\u5237\u65b0\uff0ctoken\u5c31\u4f1a\u9690\u85cf\u6389\u3002\u8fd9\u65f6\u5019\u53ea\u6709\u751f\u6210\u65b0\u7684token\u4e86\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/ayonel.malash.net\/wp-content\/uploads\/2016\/08\/QQ\u622a\u56fe20160830092306-300x51.png\" alt=\"QQ\u622a\u56fe20160830092306\" width=\"300\" height=\"51\" class=\"alignnone size-medium wp-image-120\" srcset=\"https:\/\/ayonel.malash.net\/wp-content\/uploads\/2016\/08\/QQ\u622a\u56fe20160830092306-300x51.png 300w, https:\/\/ayonel.malash.net\/wp-content\/uploads\/2016\/08\/QQ\u622a\u56fe20160830092306.png 674w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p>\u597d\u4e86\uff0c\u6709\u4e86\u9884\u5148\u7684\u51c6\u5907\u5de5\u4f5c\uff0c\u63a5\u4e0b\u6765\u5c31\u53ef\u4ee5\u7f16\u5199\u722c\u866b\u4e86\u3002\u8fd9\u6b21\u722c\u866b\u91c7\u7528scrapy\uff0c\u5173\u4e8escrapy\u76f8\u5173\u7684\u77e5\u8bc6\u5c31\u4e0d\u518d\u8d58\u8ff0\u4e86\uff0c\u7f51\u4e0a\u6709\u5f88\u591a\u6559\u7a0b\u3002<\/p>\n<p>\u7b80\u5355\u8bf4\u4e00\u4e0b\u722c\u53d6\u601d\u8def\u3002<\/p>\n<p>GitHub API \u8fd4\u56de\u7684\u6570\u636e\u90fd\u662fjson\uff0c\u8fd9\u6781\u5927\u5730\u65b9\u4fbf\u4e86\u6570\u636e\u7684\u89e3\u6790\u3002\u6211\u4eec\u672c\u6b21\u7684\u4efb\u52a1\u662f\u722c\u53d6rails\u9879\u76ee\u7684\u6240\u6709issue,\u5148\u6765\u5927\u6982\u770b\u4e00\u4e0b\u8fd4\u56de\u7684issue\u662f\u4ec0\u4e48\u6837\u5b50\u3002\u7531\u4e8e\u5185\u5bb9\u592a\u957f\uff0c\u6211\u5c31\u76f4\u63a5\u653e\u94fe\u63a5\uff0c\u5927\u5bb6\u53ef\u4ee5\u70b9\u51fb\u53bb\u770b\u3002<\/p>\n<p><a href=\"https:\/\/api.github.com\/repos\/rails\/rails\/issues\" target=\"_blank\">https:\/\/api.github.com\/repos\/rails\/rails\/issues<\/a><\/p>\n<p>\u6211\u4eec\u53ef\u4ee5\u770b\u5230\u8fd4\u56de\u7684\u9875\u9762\u662f\u4e00\u4e2ajson\u6570\u7ec4\uff0c\u6bcf\u4e2a\u5143\u7d20\u5176\u5b9e\u5c31\u662f\u4e00\u4e2aissue.\u800c\u6bcf\u4e2aissue\u91cc\u9762\u53c8\u6709\u8bf8\u591a\u4fe1\u606f\uff0c\u6bd4\u5982number,title,body\u7b49\u7b49\u3002\u6211\u4eec\u672c\u6b21\u4efb\u52a1\u5c31\u722c\u53d6rails\u6240\u6709issue\u7684\u4fe1\u606fnumber\uff0c\u63d0\u4ea4\u8005(user.login)\uff0cbody\u4ee5\u53catitle.<\/p>\n<p>\u6ce8\u610f\uff0c<a href=\"https:\/\/api.github.com\/repos\/rails\/rails\/issues\" target=\"_blank\">https:\/\/api.github.com\/repos\/rails\/rails\/issues<\/a>\u8fd4\u56de\u7684\u662f\u6309\u65f6\u95f4\u5012\u53d9\u6392\u5217\u7684issue,\u5e76\u4e14\u6bcf\u9875\u9ed8\u8ba4\u8fd4\u56de30\u6761\uff0c\u6211\u4eec\u9700\u8981\u5728url\u540e\u9762\u63a5\u4e0a\u4e00\u4e9b\u53c2\u6570\uff0c\u6765\u722c\u53d6\u6307\u5b9a\u7684\u9875\u3002\u6211\u4eec\u5c06url\u63a5\u4e0a\u53c2\u6570\u6784\u9020\u6210\u5982\u4e0b\u7684\u6837\u5b50\uff1a<\/p>\n<p><strong>https:\/\/api.github.com\/repos\/rails\/rails\/issues?per_page=99&#038;page=num<\/strong><\/p>\n<p>\u5176\u4e2dnum\u4ee3\u8868\u9875\u6570\uff0c\u6211\u4eec\u9700\u8981\u4ece1\u5f00\u59cb\u81ea\u589e\u3002per_page\u4ee3\u8868\u6bcf\u9875\u8fd4\u56de\u7684\u5143\u7d20\u4e2a\u6570,GitHub\u6700\u5927\u53ea\u80fd\u5236\u5b9a\u523099\u3002<\/p>\n<p>\u6240\u4ee5\u6211\u4eec\u7684\u722c\u866b\uff0c\u5e94\u8be5\u662f\u4ece1\u5f00\u59cb\u4e0d\u505c\u5730\u81ea\u589enum\uff0c\u76f4\u5230\u8fd4\u56de\u7684json\u6570\u7ec4\u5143\u7d20\u4e2a\u6570\u4e0d\u8db399\uff0c\u5c31\u8bf4\u660e\u722c\u53d6\u5b8c\u4e86\u3002\u53e6\u5916\uff0c\u7531\u4e8eGitHub API \u722c\u53d6\u901f\u7387\u7684\u9650\u5236\uff0c\u6211\u4e8b\u5148\u51c6\u5907\u4e8610\u4e2a\u4e0d\u540c\u8d26\u53f7\u7684token,\u5bf9\u4e8e\u6bcf\u6b21\u8bf7\u6c42\uff0c\u91cd\u65b0\u81ea\u5b9a\u4e49\u8bf7\u6c42header,\u5e26\u4e0a\u4e0d\u540c\u7684token.<\/p>\n<p>\u4e0b\u6587\u662f\u6e90\u4ee3\u7801\uff1a<\/p>\n<div class=\"codecolorer-container python railscasts\" style=\"overflow:auto;white-space:nowrap;width:100%;height:100%;\"><div class=\"python codecolorer\"><span class=\"co1\"># -*- coding: utf-8 -*-<\/span><br \/>\n__author__ <span class=\"sy0\">=<\/span> <span class=\"st0\">'ayonel'<\/span><br \/>\n<span class=\"kw1\">import<\/span> <span class=\"kw3\">itertools<\/span><br \/>\n<span class=\"kw1\">import<\/span> json<br \/>\n<span class=\"kw1\">import<\/span> <span class=\"kw3\">os<\/span><br \/>\n<span class=\"kw1\">import<\/span> scrapy<br \/>\n<span class=\"kw1\">from<\/span> scrapy <span class=\"kw1\">import<\/span> Request<br \/>\n<br \/>\n<span class=\"kw1\">class<\/span> IssueSpider<span class=\"br0\">&#40;<\/span>scrapy.<span class=\"me1\">spiders<\/span>.<span class=\"me1\">Spider<\/span><span class=\"br0\">&#41;<\/span>:<br \/>\n<br \/>\n&nbsp; &nbsp; name <span class=\"sy0\">=<\/span> <span class=\"st0\">&quot;issue&quot;<\/span> <span class=\"co1\">#\u722c\u866b\u540d\u79f0<\/span><br \/>\n&nbsp; &nbsp; allowed_domains <span class=\"sy0\">=<\/span> <span class=\"br0\">&#91;<\/span><span class=\"st0\">&quot;github.com&quot;<\/span><span class=\"br0\">&#93;<\/span> <span class=\"co1\">#\u5236\u5b9a\u722c\u53d6\u57df\u540d<\/span><br \/>\n&nbsp; &nbsp; num <span class=\"sy0\">=<\/span> <span class=\"nu0\">1<\/span> <span class=\"co1\"># \u9875\u6570\uff0c\u9ed8\u8ba4\u4ece\u7b2c\u4e00\u9875\u5f00\u59cb<\/span><br \/>\n&nbsp; &nbsp; handle_httpstatus_list <span class=\"sy0\">=<\/span> <span class=\"br0\">&#91;<\/span><span class=\"nu0\">404<\/span><span class=\"sy0\">,<\/span> <span class=\"nu0\">403<\/span><span class=\"sy0\">,<\/span> <span class=\"nu0\">401<\/span><span class=\"br0\">&#93;<\/span> <span class=\"co1\">#\u5982\u679c\u8fd4\u56de\u8fd9\u4e2a\u5217\u8868\u4e2d\u7684\u72b6\u6001\u7801\uff0c\u722c\u866b\u4e5f\u4e0d\u4f1a\u7ec8\u6b62<\/span><br \/>\n&nbsp; &nbsp; output_file <span class=\"sy0\">=<\/span> <span class=\"kw2\">open<\/span><span class=\"br0\">&#40;<\/span><span class=\"st0\">'issue.txt'<\/span><span class=\"sy0\">,<\/span> <span class=\"st0\">&quot;a&quot;<\/span><span class=\"br0\">&#41;<\/span> <span class=\"co1\">#\u8f93\u51fa\u6587\u4ef6<\/span><br \/>\n&nbsp; &nbsp; <span class=\"co1\">#token\u5217\u8868\uff0c\u9690\u53bb\u90e8\u5206<\/span><br \/>\n&nbsp; &nbsp; token_list <span class=\"sy0\">=<\/span> <span class=\"br0\">&#91;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"st0\">'293a06ac6ed5a746f7314be5a25f3d**********'<\/span><span class=\"sy0\">,<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"st0\">'66de084042a7d3311544c656ad9273**********'<\/span><span class=\"sy0\">,<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"st0\">'a513f61368e16c2da229e38e139a8e**********'<\/span><span class=\"sy0\">,<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"st0\">'9055150c8fd031468af71cbb4e12c5**********'<\/span><span class=\"sy0\">,<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"st0\">'ba119dc83af804327fa9dad8e07718**********'<\/span><span class=\"sy0\">,<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"st0\">'b93e6996a4d76057d16e5e45788fbf**********'<\/span><span class=\"sy0\">,<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"st0\">'c9c13e5c14d6876c76919520c9b05d**********'<\/span><span class=\"sy0\">,<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"st0\">'3e41cbfc0c8878aec935fba68a0d3c**********'<\/span><span class=\"sy0\">,<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"st0\">'402ff55399ca08ca7c886a2031f49f**********'<\/span><span class=\"sy0\">,<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"st0\">'7cb6e20a24000968983b79b5de705c**********'<\/span><span class=\"sy0\">,<\/span><br \/>\n&nbsp; &nbsp; <span class=\"br0\">&#93;<\/span><br \/>\n&nbsp; &nbsp; token_iter <span class=\"sy0\">=<\/span> <span class=\"kw3\">itertools<\/span>.<span class=\"me1\">cycle<\/span><span class=\"br0\">&#40;<\/span>token_list<span class=\"br0\">&#41;<\/span> <span class=\"co1\">#\u751f\u6210\u5faa\u73af\u8fed\u4ee3\u5668\uff0c\u8fed\u4ee3\u5230\u6700\u540e\u4e00\u4e2atoken\u540e\uff0c\u4f1a\u91cd\u65b0\u5f00\u59cb\u8fed\u4ee3<\/span><br \/>\n<br \/>\n<br \/>\n&nbsp; &nbsp; <span class=\"kw1\">def<\/span> <span class=\"kw4\">__init__<\/span><span class=\"br0\">&#40;<\/span><span class=\"kw2\">self<\/span><span class=\"br0\">&#41;<\/span>: <span class=\"co1\">#\u521d\u59cb\u5316<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; scrapy.<span class=\"me1\">spiders<\/span>.<span class=\"me1\">Spider<\/span>.<span class=\"kw4\">__init__<\/span><span class=\"br0\">&#40;<\/span><span class=\"kw2\">self<\/span><span class=\"br0\">&#41;<\/span><br \/>\n<br \/>\n&nbsp; &nbsp; <span class=\"kw1\">def<\/span> <span class=\"kw4\">__del__<\/span><span class=\"br0\">&#40;<\/span><span class=\"kw2\">self<\/span><span class=\"br0\">&#41;<\/span>: <span class=\"co1\">#\u722c\u866b\u7ed3\u675f\u65f6\uff0c\u5173\u95ed\u6587\u4ef6<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw2\">self<\/span>.<span class=\"me1\">output_file<\/span>.<span class=\"me1\">close<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span><br \/>\n<br \/>\n&nbsp; &nbsp; <span class=\"kw1\">def<\/span> start_requests<span class=\"br0\">&#40;<\/span><span class=\"kw2\">self<\/span><span class=\"br0\">&#41;<\/span>:<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; start_urls <span class=\"sy0\">=<\/span> <span class=\"br0\">&#91;<\/span><span class=\"br0\">&#93;<\/span> <span class=\"co1\">#\u521d\u59cb\u722c\u53d6\u94fe\u63a5\u5217\u8868<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; url <span class=\"sy0\">=<\/span> <span class=\"st0\">&quot;https:\/\/api.github.com\/repos\/rails\/rails\/issues?per_page=99&amp;page=&quot;<\/span>+<span class=\"kw2\">str<\/span><span class=\"br0\">&#40;<\/span><span class=\"kw2\">self<\/span>.<span class=\"me1\">num<\/span><span class=\"br0\">&#41;<\/span> <span class=\"co1\">#\u7b2c\u4e00\u6761\u722c\u53d6url<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"co1\">#\u6dfb\u52a0\u4e00\u4e2a\u722c\u53d6\u8bf7\u6c42<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; start_urls.<span class=\"me1\">append<\/span><span class=\"br0\">&#40;<\/span>scrapy.<span class=\"me1\">FormRequest<\/span><span class=\"br0\">&#40;<\/span>url<span class=\"sy0\">,<\/span> headers<span class=\"sy0\">=<\/span><span class=\"br0\">&#123;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"st0\">'User-Agent'<\/span>: <span class=\"st0\">'Mozilla\/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko\/20100101 Firefox\/36.0'<\/span><span class=\"sy0\">,<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"st0\">'Accept'<\/span>: <span class=\"st0\">'text\/html,application\/xhtml+xml,application\/xml;q=0.9,*\/*;q=0.8'<\/span><span class=\"sy0\">,<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"st0\">'Accept-Language'<\/span>: <span class=\"st0\">'en'<\/span><span class=\"sy0\">,<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"st0\">'Authorization'<\/span>: <span class=\"st0\">'token '<\/span> + <span class=\"kw2\">self<\/span>.<span class=\"me1\">token_iter<\/span>.<span class=\"me1\">next<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span><span class=\"sy0\">,<\/span><span class=\"co1\">#\u8fd9\u4e2a\u5b57\u6bb5\u4e3a\u6dfb\u52a0token\u5b57\u6bb5<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"br0\">&#125;<\/span><span class=\"sy0\">,<\/span> callback<span class=\"sy0\">=<\/span><span class=\"kw2\">self<\/span>.<span class=\"me1\">parse<\/span><span class=\"br0\">&#41;<\/span><span class=\"br0\">&#41;<\/span> <br \/>\n<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw1\">return<\/span> start_urls<br \/>\n<br \/>\n&nbsp; &nbsp; <span class=\"kw1\">def<\/span> yield_request<span class=\"br0\">&#40;<\/span><span class=\"kw2\">self<\/span><span class=\"br0\">&#41;<\/span>: <span class=\"co1\">#\u5b9a\u4e49\u4e00\u4e2a\u751f\u6210\u8bf7\u6c42\u51fd\u6570<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; url <span class=\"sy0\">=<\/span> <span class=\"st0\">&quot;https:\/\/api.github.com\/repos\/rails\/rails\/issues?per_page=99&amp;page=&quot;<\/span>+<span class=\"kw2\">str<\/span><span class=\"br0\">&#40;<\/span><span class=\"kw2\">self<\/span>.<span class=\"me1\">num<\/span><span class=\"br0\">&#41;<\/span> <span class=\"co1\">#\u751f\u6210url<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"co1\">#\u8fd4\u56de\u8bf7\u6c42<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw1\">return<\/span> Request<span class=\"br0\">&#40;<\/span>url<span class=\"sy0\">,<\/span>headers<span class=\"sy0\">=<\/span><span class=\"br0\">&#123;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"st0\">'User-Agent'<\/span>: <span class=\"st0\">'Mozilla\/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko\/20100101 Firefox\/36.0'<\/span><span class=\"sy0\">,<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"st0\">'Accept'<\/span>: <span class=\"st0\">'text\/html,application\/xhtml+xml,application\/xml;q=0.9,*\/*;q=0.8'<\/span><span class=\"sy0\">,<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"st0\">'Accept-Language'<\/span>: <span class=\"st0\">'en'<\/span><span class=\"sy0\">,<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"st0\">'Authorization'<\/span>: <span class=\"st0\">'token '<\/span> + <span class=\"kw2\">self<\/span>.<span class=\"me1\">token_iter<\/span>.<span class=\"me1\">next<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span><span class=\"sy0\">,<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"br0\">&#125;<\/span><span class=\"sy0\">,<\/span>callback<span class=\"sy0\">=<\/span><span class=\"kw2\">self<\/span>.<span class=\"me1\">parse<\/span><span class=\"br0\">&#41;<\/span><br \/>\n<br \/>\n&nbsp; &nbsp; <span class=\"co1\">#\u89e3\u6790\u51fd\u6570<\/span><br \/>\n&nbsp; &nbsp; <span class=\"kw1\">def<\/span> parse<span class=\"br0\">&#40;<\/span><span class=\"kw2\">self<\/span><span class=\"sy0\">,<\/span> response<span class=\"br0\">&#41;<\/span>:<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw1\">if<\/span> response.<span class=\"me1\">status<\/span> <span class=\"kw1\">in<\/span> <span class=\"kw2\">self<\/span>.<span class=\"me1\">handle_httpstatus_list<\/span>:<span class=\"co1\">#\u5982\u679c\u9047\u89c1handle_httpstatus_list\u4e2d\u51fa\u73b0\u7684\u72b6\u6001\u7801<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw2\">self<\/span>.<span class=\"me1\">num<\/span> +<span class=\"sy0\">=<\/span> <span class=\"nu0\">1<\/span> <span class=\"co1\">#num\u81ea\u589e\uff0c\u76f8\u5f53\u4e8e\u76f4\u63a5\u8df3\u8fc7\uff0c\u53ef\u4ee5\u8f93\u51fa\u5f53\u524durl\u5230log\u6587\u4ef6<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw1\">yield<\/span> <span class=\"kw2\">self<\/span>.<span class=\"me1\">yield_request<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span> <span class=\"co1\">#\u4ea7\u751f\u65b0\u7684\u8bf7\u6c42<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw1\">return<\/span><br \/>\n<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; json_data <span class=\"sy0\">=<\/span> json.<span class=\"me1\">loads<\/span><span class=\"br0\">&#40;<\/span>response.<span class=\"me1\">body_as_unicode<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span><span class=\"br0\">&#41;<\/span> <span class=\"co1\">#\u83b7\u53d6json<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; length <span class=\"sy0\">=<\/span> <span class=\"kw2\">len<\/span><span class=\"br0\">&#40;<\/span>json_data<span class=\"br0\">&#41;<\/span> <span class=\"co1\">#\u83b7\u53d6json\u957f\u5ea6<\/span><br \/>\n<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw1\">if<\/span> length <span class=\"sy0\">==<\/span> <span class=\"nu0\">99<\/span>:<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw2\">self<\/span>.<span class=\"me1\">num<\/span> <span class=\"sy0\">=<\/span> <span class=\"kw2\">self<\/span>.<span class=\"me1\">num<\/span> + <span class=\"nu0\">1<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw1\">for<\/span> issue <span class=\"kw1\">in<\/span> json_data:<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; data <span class=\"sy0\">=<\/span> <span class=\"br0\">&#123;<\/span><span class=\"br0\">&#125;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; data<span class=\"br0\">&#91;<\/span><span class=\"st0\">'number'<\/span><span class=\"br0\">&#93;<\/span> <span class=\"sy0\">=<\/span> issue<span class=\"br0\">&#91;<\/span><span class=\"st0\">'number'<\/span><span class=\"br0\">&#93;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; data<span class=\"br0\">&#91;<\/span><span class=\"st0\">'owner'<\/span><span class=\"br0\">&#93;<\/span> <span class=\"sy0\">=<\/span> issue<span class=\"br0\">&#91;<\/span><span class=\"st0\">'user'<\/span><span class=\"br0\">&#93;<\/span><span class=\"br0\">&#91;<\/span><span class=\"st0\">'login'<\/span><span class=\"br0\">&#93;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; data<span class=\"br0\">&#91;<\/span><span class=\"st0\">'title'<\/span><span class=\"br0\">&#93;<\/span> <span class=\"sy0\">=<\/span> issue<span class=\"br0\">&#91;<\/span><span class=\"st0\">'title'<\/span><span class=\"br0\">&#93;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; data<span class=\"br0\">&#91;<\/span><span class=\"st0\">'body'<\/span><span class=\"br0\">&#93;<\/span> <span class=\"sy0\">=<\/span> issue<span class=\"br0\">&#91;<\/span><span class=\"st0\">'body'<\/span><span class=\"br0\">&#93;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw2\">self<\/span>.<span class=\"me1\">output_file<\/span>.<span class=\"me1\">write<\/span><span class=\"br0\">&#40;<\/span>json.<span class=\"me1\">dumps<\/span><span class=\"br0\">&#40;<\/span>data<span class=\"br0\">&#41;<\/span>+<span class=\"st0\">'<span class=\"es0\">\\n<\/span>'<\/span><span class=\"br0\">&#41;<\/span> <span class=\"co1\">#\u8f93\u51fa\u6bcf\u4e00\u884c\uff0c\u683c\u5f0f\u4e5f\u4e3ajson<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw2\">self<\/span>.<span class=\"me1\">output_file<\/span>.<span class=\"me1\">flush<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw1\">yield<\/span> <span class=\"kw2\">self<\/span>.<span class=\"me1\">yield_request<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span> <span class=\"co1\">#\u4ea7\u751f\u65b0\u7684\u8bf7\u6c42<\/span><br \/>\n<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw1\">elif<\/span> length <span class=\"sy0\">&lt;<\/span> <span class=\"nu0\">99<\/span>: <span class=\"co1\">#\u610f\u5473\u7740\u722c\u53d6\u5230\u6700\u540e\u4e00\u9875<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw1\">for<\/span> issue <span class=\"kw1\">in<\/span> json_data:<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;data <span class=\"sy0\">=<\/span> <span class=\"br0\">&#123;<\/span><span class=\"br0\">&#125;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; data<span class=\"br0\">&#91;<\/span><span class=\"st0\">'number'<\/span><span class=\"br0\">&#93;<\/span> <span class=\"sy0\">=<\/span> issue<span class=\"br0\">&#91;<\/span><span class=\"st0\">'number'<\/span><span class=\"br0\">&#93;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; data<span class=\"br0\">&#91;<\/span><span class=\"st0\">'owner'<\/span><span class=\"br0\">&#93;<\/span> <span class=\"sy0\">=<\/span> issue<span class=\"br0\">&#91;<\/span><span class=\"st0\">'user'<\/span><span class=\"br0\">&#93;<\/span><span class=\"br0\">&#91;<\/span><span class=\"st0\">'login'<\/span><span class=\"br0\">&#93;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; data<span class=\"br0\">&#91;<\/span><span class=\"st0\">'title'<\/span><span class=\"br0\">&#93;<\/span> <span class=\"sy0\">=<\/span> issue<span class=\"br0\">&#91;<\/span><span class=\"st0\">'title'<\/span><span class=\"br0\">&#93;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; data<span class=\"br0\">&#91;<\/span><span class=\"st0\">'body'<\/span><span class=\"br0\">&#93;<\/span> <span class=\"sy0\">=<\/span> issue<span class=\"br0\">&#91;<\/span><span class=\"st0\">'body'<\/span><span class=\"br0\">&#93;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw2\">self<\/span>.<span class=\"me1\">output_file<\/span>.<span class=\"me1\">write<\/span><span class=\"br0\">&#40;<\/span>json.<span class=\"me1\">dumps<\/span><span class=\"br0\">&#40;<\/span>data<span class=\"br0\">&#41;<\/span>+<span class=\"st0\">'<span class=\"es0\">\\n<\/span>'<\/span><span class=\"br0\">&#41;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw2\">self<\/span>.<span class=\"me1\">output_file<\/span>.<span class=\"me1\">flush<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>GitHub API\u5176\u5b9e\u662f\u4e00\u5ea7\u5b9d\u85cf\uff0c\u5b83\u62e5\u6709\u7740\u6d77\u91cf\u9879\u76ee\u4ee5\u53ca\u5f00\u53d1\u8005\u7684\u5404\u7c7b\u4fe1\u606f\uff0c\u53ef\u4ee5\u4f5c\u4e3a\u793e\u4ea4\u7f16\u7a0b\u4ee5\u53ca\u7ecf\u9a8c\u8f6f\u4ef6\u5de5\u7a0b\u8bfe\u9898 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5,6],"tags":[],"class_list":["post-115","post","type-post","status-publish","format-standard","hentry","category-python","category-machine-learning"],"_links":{"self":[{"href":"https:\/\/ayonel.malash.net\/index.php\/wp-json\/wp\/v2\/posts\/115","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ayonel.malash.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ayonel.malash.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ayonel.malash.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ayonel.malash.net\/index.php\/wp-json\/wp\/v2\/comments?post=115"}],"version-history":[{"count":9,"href":"https:\/\/ayonel.malash.net\/index.php\/wp-json\/wp\/v2\/posts\/115\/revisions"}],"predecessor-version":[{"id":170,"href":"https:\/\/ayonel.malash.net\/index.php\/wp-json\/wp\/v2\/posts\/115\/revisions\/170"}],"wp:attachment":[{"href":"https:\/\/ayonel.malash.net\/index.php\/wp-json\/wp\/v2\/media?parent=115"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ayonel.malash.net\/index.php\/wp-json\/wp\/v2\/categories?post=115"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ayonel.malash.net\/index.php\/wp-json\/wp\/v2\/tags?post=115"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}