After more digging I found this:
Googlebot executes JavaScript if either:
A script is loaded via a <script> tag and executed right after being loaded (e.g. via an IIFE),
A function is bound to and executed on the 'DOMContentLoaded' event, or
A function is bound to and executed on the window.onload event.
The Googlebot does NOT execute JavaScript if either:
The code is executed after an AJAX call returns or
The code is executed after a timeout.
Based on the above anything loaded via AJAX would not be indexed, whereas anything loaded synchronously and executed prior to or at DOMContentLoaded or window.onload. I'm not a dev so I may have this wrong, but from the looks of things Disqus uses an XHR object to load comments (so not indexed unless you use their API to load into rendered HTML), while other techniques will render comments to DOM before onload, and these are indeed indexed by Google.
I'd still like to know if they pass signals across links rendered though. Might need to actually do some testing to find out.
Embedded Link
Nicolai Kamenzky – “Precomposing” a SPA may become the Holy Grail to SEO
“Precomposing” a SPA may become the Holy Grail to SEO
Google+: Reshared 1 times
Google+: View post on Google+
This post was first made on the Richard Hearne Google+ profile.
Not exactly the holy grail for SEO if you don't include all other major search engines, especially if you live in the states. Although Bing/hoo is a smaller marketshare, at scale I still wouldn't want to give up 20-30% of my traffic.
Also that doesn't include FB, Twitter, or anything else that needs to read structured data.
Serverside rendering on initial request is still the proper way to go for good crawlability / SEO.
Comment by Eric Wu — February 3, 2015 @ 4:47 pm
Sorry, I should have given more context here +Eric Wu My interest is in whether links rendered to the page via JS and indexed by Google can pass PageRank and associated ranking signals.
The reason is that I'm working with a site that loads comments via JS, I suspect reasoning that these wont be indexed. The content isn't bad, but some users figures out how to get follow links in and I'm trying to find justification to get these NOFOLLOWed. It's a very big site, and changes wont be made without such justification.
Any knowledge on this topic? I know you've been working with SPAs.
Comment by Richard Hearne — February 3, 2015 @ 4:51 pm
+Richard Hearne I think you are wrong here, namely both Disqus and FB comments can be indexed by Googlebot on the page. Just two examples.
Disqus comment on SE-land:
https://www.google.com/search?q="Do+you+think+this+is+a+side+stepping+attempt+to+incorporate+a+reward+for+advertising+on+Google+to+organic+listing+result+rank%3F"
FB comment on TechCrunch:
https://www.google.nl/search?q="Facebook+comment+box+is+awesome+and+if+you+know+how+to+put+some+extra+emoticons+in+your+comment+then+it+will+be+even+more+pleasent+:)"
My guess is that the link to iframe, which is somehow loaded via js, is being indexed and "merged" to the particular page, as if it is content on the page.
P.S. the example pages are from 2011, just to make sure they are indexed. Even though these are popular sites, it doesn't seem to be that the comments are indexed asap….
Comment by Edwin Jonk — February 3, 2015 @ 5:25 pm
Hi Edwin. Disqus can be loaded either via JS client-side, or an API server-side. So if you look at what seroundtable was doing, they were loading the comments server-side into the HTML, but allowing the JS to also load client-side. The HTML comments were for Google, and JS for users. They turned off the server-side stuff after John Mueller told Barry his comments were causing quality issues.
I haven't looked at Facebook comments, but I assume Google is doing something a bit custom to pull them in. Especially if the comments appear in an iFrame (as in case of Techcrunch). Thanks for the input though – more things to think about now.
Comment by Richard Hearne — February 3, 2015 @ 5:38 pm
Quick follow-up on Facebook comments on Techcrunch:
<fb:comments href="http://techcrunch.com/2015/02/02/uber-opening-robotics-research-facility-in-pittsburgh-to-build-self-driving-cars/" num_posts="25" width="100%"></fb:comments>
So they are embedded using XFBML, and it looks like Google must have built a custom way of crawling this and parsing it to the page. If you look at the source of Google's cached text version you'll see where they inject the crawled comments block. Nasty looking stuff,
Comment by Richard Hearne — February 3, 2015 @ 5:45 pm
I understand the HTML snapshot. After some digging I am pretty sure Googlebot can do XHR requests [1].
we improve both the indexing of a page and its Instant Preview by following the automatic XMLHttpRequest generated as the page renders.
However, Googlebot might only preform one request. And, thereby, the content/comments that are loaded "automatically" when you visit the page are not "seen" by Googlebot.
http://googlewebmastercentral.blogspot.nl/2011/11/get-post-and-safely-surfacing-more-of.html
Comment by Edwin Jonk — February 3, 2015 @ 5:53 pm
As far as I understand both Disqus and FB use iframes that are generated by js. Following my examples, the iframe points:
For disqus to
https://disqus.com/embed/comments/?base=default&version=ff15479433461993d0738de53d5f22cf&f=searchengineland&t_i=85686%20http%3A%2F%2Fsearchengineland.com%2F%3Fp%3D85686&t_u=http%3A%2F%2Fsearchengineland.com%2F1-for-the-in-house-search-engine-marketer-85686&t_e=%2B1%20For%20The%20In-House%20Search%20Engine%20Marketer&t_d=%2B1%20For%20The%20In-House%20Search%20Engine%20Marketer&t_t=%2B1%20For%20The%20In-House%20Search%20Engine%20Marketer&s_o=default&l=#2
For FB to
https://www.facebook.com/plugins/comments.php?api_key=187288694643718&channel_url=http%3A%2F%2Fstatic.ak.facebook.com%2Fconnect%2Fxd_arbiter%2FDU1Ia251o0y.js%3Fversion%3D41%23cb%3Df1f3a7654c%26domain%3Dtechcrunch.com%26origin%3Dhttp%253A%252F%252Ftechcrunch.com%252Ff203a0b4f8%26relation%3Dparent.parent&href=http%3A%2F%2Ftechcrunch.com%2F2011%2F04%2F12%2Ffacebook-comments-now-on-over-50k-sites-get-more-social-with-latest-upgrade%2F&locale=en_US&numposts=25&sdk=joey&width=100%25
Comment by Edwin Jonk — February 3, 2015 @ 5:54 pm
Thanks, nice find. But I suspect the key to this is that it wont do any XHR after the $(document).ready state change. Even that seems to be contradicted by other tests.
On Disqus – if you look at the source code for any seland post you'll see the comments rendered in HTML. FB is actually rendering in an iframe, but the iframe is generated by XFBML as opposed to sitting in the rendered page.
Part of me thinks that what they announced in 2011 may have been a little bit of vapor TBH.
Comment by Richard Hearne — February 3, 2015 @ 6:01 pm
These things are bound to change over time, so I'd be wary of reading too much into this specific implementation :). The idea is not to make your lives harder – really! -, but rather to make sure that we can show your site for all of the relevant queries that it deserves, based on the content that users would see.
Comment by John Mueller — February 3, 2015 @ 10:41 pm
+John Mueller sorry it's my bad for bringing that particular SPA implementation into the discussion. What I'm really interested in is when JavaScript rendered content Google is indexing.
My use-case is that I'm working with a very big blog which renders comments via JS. The comments used NOFOLLOW for links, but I've found that some people found a loophole that allowed them to get FOLLOW links in the comment bodies. I asked +Pierre Far and he said we could add NOFOOLOW to any link we have concerns about, but If I have a concrete justification that links rendered by JS may be interpreted the same as HTML-rendered links it will be far easier for me to sell this to stakeholders.
Any help here is much appreciated.
Comment by Richard Hearne — February 4, 2015 @ 4:33 am
If you were wondering what a SPA was/is…..
http://en.wikipedia.org/wiki/Single-page_application
I'm a noob….had to do some googling to catch up here! :P
Comment by Trey Collier — February 4, 2015 @ 8:44 am
+Richard Hearne yes, if we discover links through rendering, we'll treat them (more or less — there's an element of latency in there) the same as other links on the page.
Comment by John Mueller — February 4, 2015 @ 4:36 pm
I hate to ask you John, but the "more or less" throws me, so apart from latency will the links be treated similarly i.e. will they pass the same signals as a plain HTML link? If we build our navigation via a JS render, which Googlebot also renders, will it have the same outcome as a HTML rendered equivilant?
The other big issue I'm having is determining if things like navigation flyouts (hidden until an event happens) rendered via e.g. Handlebars are getting rendered on not by Googlebot. Since they're hidden from initial view we cant tell from the GWT Fetch and Render tool, and the Fetching data view simply shows the scaffold inside script tags.
Sorry to ask, but there seems to be a serious lack of testing/discussion around some of this
Comment by Richard Hearne — February 4, 2015 @ 4:42 pm
This is all still fairly new, we've been working on indexing JS content for a while, and we're getting pretty good at it, but I wouldn't say that all JS content is already 1:1 equivalent to static HTML. There are lots of moving parts, especially on the website-side. Where we can recognize the links in the rendered version, we'll try to treat them as equivalents to static HTML links (passing the same signals, anchor text, etc). I'd like to say that we'll treat them exactly the same all the time, but I can imagine there may be abuse scenarios where we have to do something different.
Comment by John Mueller — February 4, 2015 @ 4:57 pm
Thanks John
That's really very useful.
So now my last question for you. If you visited a website such as http://www.hubspot.com you'll see (on desktop) a navigation menu at top of page. All fine and normal. Until you view the source and you'll only find the scaffold template based in script tags. Google's cache shows the rendered nav, BUT the text only cache shows a nav contained in a <noscript> element.
In the good old days JS navs were just bad. Those were black and white days. Now how we can tell whether an implementation such as above is OK from a search perspective? Is there any easy way to determine if Google really is counting these exotic navigation links?
Sorry for extended questioning, but I suspect this will be useful to many people.
Comment by Richard Hearne — February 4, 2015 @ 5:05 pm
I don't think there's currently a way to determine that absolutely (apart from things that are obviously blocked, or directly included in the static HTML).
Comment by John Mueller — February 5, 2015 @ 11:27 am