{"id":630,"date":"2016-04-03T21:00:38","date_gmt":"2016-04-04T02:00:38","guid":{"rendered":"http:\/\/sunapi386.ca\/wordpress\/?p=630"},"modified":"2016-04-04T16:43:32","modified_gmt":"2016-04-04T21:43:32","slug":"migrating-copying-blog-qq-qzone-wordpress","status":"publish","type":"post","link":"https:\/\/sunapi386.ca\/wordpress\/migrating-copying-blog-qq-qzone-wordpress\/","title":{"rendered":"Migrating copying blog from QQ Qzone to WordPress"},"content":{"rendered":"<p>Recently I made a website for Sophie (smoosophie.com) and scraped her blog from QQ&#8217;s QZone.<br \/>\nQQ has an API that can be used for QZone, but it is hard to develop for. Mainly the API is poorly documented and hard to use for a native English speaker. Although I could read Chinese (slowly), reading professional terms is something I have to work on.<\/p>\n<h1>Workflow screenshots<\/h1>\n<p>Anyhow, I decided to practice javascript and manually scrape it.\u00a0Here are the steps I took:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-631\" src=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.43.17-300x274.png\" alt=\"Screen Shot 2016-04-03 at 18.43.17\" width=\"300\" height=\"274\" srcset=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.43.17-300x274.png 300w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.43.17-768x702.png 768w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.43.17-624x570.png 624w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.43.17.png 973w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p>This is what the QZone page looks like.<\/p>\n<p><a href=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.45.12.png\" rel=\"attachment wp-att-633\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-633\" src=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.45.12-300x274.png\" alt=\"Screen Shot 2016-04-03 at 18.45.12\" width=\"300\" height=\"274\" srcset=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.45.12-300x274.png 300w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.45.12-768x702.png 768w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.45.12-624x570.png 624w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.45.12.png 973w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>This is running the fetching script.<\/p>\n<p><a href=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.44.22.png\" rel=\"attachment wp-att-632\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-632\" src=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.44.22-300x274.png\" alt=\"Screen Shot 2016-04-03 at 18.44.22\" width=\"300\" height=\"274\" srcset=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.44.22-300x274.png 300w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.44.22-768x702.png 768w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.44.22-624x570.png 624w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.44.22.png 973w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>After\u00a0running the fetching script, it all of the blog links as individual iframes onto the page. Because these loads are network requests, I manually wait until all of them are loaded. The javascript console will stop scrolling and printing things if they&#8217;ve all completely loaded.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-635\" src=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.45.31-300x274.png\" alt=\"Screen Shot 2016-04-03 at 18.45.31\" width=\"300\" height=\"274\" srcset=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.45.31-300x274.png 300w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.45.31-768x702.png 768w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.45.31-624x570.png 624w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.45.31.png 973w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p>Here are all the frames after being loaded. Then I run the scraper javascript, and it scrapes the iframe for their blog content and post them to the current screen.<\/p>\n<p><a href=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.45.28.png\" rel=\"attachment wp-att-634\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-634\" src=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.45.28-300x274.png\" alt=\"Screen Shot 2016-04-03 at 18.45.28\" width=\"300\" height=\"274\" srcset=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.45.28-300x274.png 300w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.45.28-768x702.png 768w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.45.28-624x570.png 624w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.45.28.png 973w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>I manually copy paste them into csv files.<\/p>\n<p><a href=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.46.21.png\" rel=\"attachment wp-att-636\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-636\" src=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.46.21-300x124.png\" alt=\"Screen Shot 2016-04-03 at 18.46.21\" width=\"300\" height=\"124\" srcset=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.46.21-300x124.png 300w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.46.21.png 556w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>This is the first line of a csv entry.<\/p>\n<p><a href=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.46.40.png\" rel=\"attachment wp-att-637\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-637\" src=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.46.40-300x38.png\" alt=\"Screen Shot 2016-04-03 at 18.46.40\" width=\"300\" height=\"38\" srcset=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.46.40-300x38.png 300w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.46.40-768x96.png 768w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.46.40-624x78.png 624w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.46.40.png 815w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>Then I wrote a ruby script to add these csv to a wordpress blog.<\/p>\n<p><a href=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.48.05.png\" rel=\"attachment wp-att-638\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-638\" src=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.48.05-300x198.png\" alt=\"Screen Shot 2016-04-03 at 18.48.05\" width=\"300\" height=\"198\" srcset=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.48.05-300x198.png 300w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.48.05-624x411.png 624w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.48.05.png 727w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>Here&#8217;s what running it looks like.<\/p>\n<p><a href=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.48.43.png\" rel=\"attachment wp-att-639\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-639\" src=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.48.43-300x284.png\" alt=\"Screen Shot 2016-04-03 at 18.48.43\" width=\"300\" height=\"284\" srcset=\"https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.48.43-300x284.png 300w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.48.43-768x728.png 768w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.48.43-624x592.png 624w, https:\/\/sunapi386.ca\/wordpress\/wp-content\/uploads\/2016\/04\/Screen-Shot-2016-04-03-at-18.48.43.png 807w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h1>Source code<\/h1>\n<p>Run the fetcher.js on the blog page.<\/p>\n<p>&nbsp;<\/p>\n<pre>\/\/ fetcher.js\r\n\/\/ Javascript to paste into chrome console to fetch the posts.\r\n\r\nvar ul = document.createElement(\"UL\") \/\/ Create a &lt;ul&gt; node\r\nul.setAttribute(\"id\", \"myList\")\r\ndocument.body.appendChild(ul);\r\nvar s = \"|^@^|\"; \/\/ seperator\r\nvar iframesArray = [];\r\nvar messagesArray = [];\r\nvar posts = document.getElementById(\"tblog\").contentWindow.document.getElementById(\"listArea\").getElementsByTagName(\"li\")\r\nfor (var i = 0; i &lt; posts.length \/**\/; i++) {\r\n var postTitle = posts[i].getElementsByClassName(\"article\")[0].textContent;\r\n var postLink = posts[i].getElementsByClassName(\"article\")[0].getElementsByTagName(\"a\")[0].href\r\n \/\/ http:\/\/user.qzone.qq.com\/765591203\/blog\/1400775512\r\n var entryDate = posts[i].getElementsByClassName(\"list_op\")[0].childNodes[0].textContent;\r\n var readCount = posts[i].getElementsByClassName(\"list_op\")[0].childNodes[2].textContent;\r\n var message = i +s+ postTitle +s+ entryDate +s+ readCount +s+ postLink +s;\r\n messagesArray.push(message);\r\n var iframe = document.createElement('iframe')\r\n iframe.src = postLink;\r\n document.body.appendChild(iframe)\r\n iframesArray.push(iframe);\r\n}\r\n\r\n<\/pre>\n<p>After you manually determine the time to wait, run this to scrape all the iframes and add to the current html. There you can copy paste into a csv file.<\/p>\n<pre>\/\/ scraper.js\r\n\/\/ Execute this when everything seems to have laoded\r\nfor (var i = 0; i &lt; iframesArray.length \/**\/; i++) {\r\n var contWin = iframesArray[i].contentWindow;\r\n var win = contWin.document;\r\n \/\/ Scraping\r\n var blogsection = win.getElementById(\"tblog\").contentWindow.document;\r\n var postTexts = blogsection.getElementById('blogDetailDiv').children;\r\n var postString = \"\";\r\n\r\n \/\/ Stupid check because sometimes if there are no childrens.\r\n if (postTexts.length == 0) {\r\n postString = blogsection.getElementById('blogDetailDiv').innerHTML;\r\n }\r\n else if (postTexts[0].tagName == \"BR\") {\r\n var brString = blogsection.getElementById('blogDetailDiv').innerHTML;\r\n postString = brString; \/\/ .replace(\/&lt;br&gt;\/g, \"\\n\")\r\n }\r\n else {\r\n for (var p = 0; p &lt; postTexts.length; p++) {\r\n postString += \"&lt;p&gt;\" + postTexts[p].innerHTML + \"\\n&amp;nbsp;&lt;\/p&gt;\";\r\n }\r\n }\r\n\r\n postString = postString.trim();\r\n if (postString.length == 0) {\r\n postString = \"ERROR parsing!\";\r\n }\r\n\r\n var message = messagesArray[i] + postString;\r\n console.log(message)\r\n var node = document.createElement(\"LI\") \/\/ Create a &lt;li&gt; node\r\n var textnode = document.createTextNode(message) \/\/ Create a text node\r\n node.appendChild(textnode) \/\/ Append the text to &lt;li&gt;\r\n document.getElementById(\"myList\").appendChild(node) \/\/ Append &lt;li&gt; to &lt;ul&gt; with id=\"myList\"\r\n \/\/ document.body.removeChild(iframe)\r\n \/\/ win.body.parentNode.removeChild(win.body)\r\n}\r\n\r\n<\/pre>\n<p>Once the csv files are created, post it to the wordpress blog.<\/p>\n<pre>#poster.rb\r\nrequire 'rubypress'\r\nrequire \"csv\"\r\nrequire 'sanitize'\r\nwp = Rubypress::Client.new(:host =&gt; \"smoosophie.com\",\r\n :username =&gt; \"sophie\",\r\n :password =&gt; \"1Smoosophie!\",\r\n # :use_ssl =&gt; true,\r\n :retry_timeouts =&gt; true)\r\nDir[\"*.csv\"].each do |name|\r\n puts \"============= #{name} =============\"\r\n csv = CSV.read(name, {:col_sep =&gt; \"|^@^|\", :quote_char =&gt; \"\uf8ff\" }) # whatahack \uf8ff\r\n csv.each do |row|\r\n id, title, pubdate, readcount, link, text = row\r\n cleantext = Sanitize.clean(text, :elements =&gt; ['br','p', 'a'])\r\n puts(\"#{id}-#{title}\")\r\n # puts(\"#id #{id}, title #{title}, pubdate #{pubdate}, readcount #{readcount}, link #{link}, text #{text}\")\r\n begin\r\n retries ||= 0\r\n puts \"try ##{ retries }\"\r\n sleep(1)\r\n wp.newPost( :blog_id =&gt; \"0\", # 0 unless using WP Multi-Site, then use the blog id\r\n :content =&gt; {\r\n :post_status =&gt; \"publish\",\r\n :post_date =&gt; Time.parse(pubdate),\r\n :post_content =&gt; cleantext,\r\n :post_title =&gt; title,\r\n :post_author =&gt; 1, # 1 if there only is admin user, otherwise user's id\r\n :terms_names =&gt; {:category =&gt; ['QQZone'], :post_tag =&gt; ['QQZone'] }\r\n }\r\n )\r\n rescue\r\n retry if (retries += 1) &lt; 3\r\n end\r\n end\r\nend\r\n\r\n<\/pre>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently I made a website for Sophie (smoosophie.com) and scraped her blog from QQ&#8217;s QZone. QQ has an API that can be used for QZone, but it is hard to develop for. Mainly the API is poorly documented and hard to use for a native English speaker. Although I could read Chinese (slowly), reading professional &hellip; <a href=\"https:\/\/sunapi386.ca\/wordpress\/migrating-copying-blog-qq-qzone-wordpress\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Migrating copying blog from QQ Qzone to WordPress<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-630","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/posts\/630","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/comments?post=630"}],"version-history":[{"count":3,"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/posts\/630\/revisions"}],"predecessor-version":[{"id":642,"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/posts\/630\/revisions\/642"}],"wp:attachment":[{"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/media?parent=630"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/categories?post=630"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/tags?post=630"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}