Paul Ford
2 min readAug 9, 2016

--

Hi, Adewale. I’m not really asking for anyone to do anything, nor do I expect Google to do anything in particular—I’m just noting the gap between where the platform is and where it could be.

The viewer is a good start. I just sort of see AMP in the context of the DOM—what kind of object model could I build around AMP in order to deal with it at the object level, perhaps in non-JS languages, so that I could approach the parsing and the building of AMP content as data? Right now the platform is very new and there isn’t much there. I expect it to grow in time.

As Kevin Marks noted, an AMP-focused crawler feels non-trivial. First, I would need to determine the right representation for storing the AMP content and identify a good search engine that could scale. It’s hard to build a good crawler/storage layer and it takes more effort to think about how to leverage the AMP data. The lack of convenient tooling makes this harder.

Then I would need to crawl URLs. Even with a list of those URLs there’s no heuristic known to me that would identify where the AMP content exists inside of a given site, as home pages don’t point towards AMP content. So that’s also non-trivial. A whole-web crawl or even a top-10,000-site crawl plus the development work above means that a well-instrumented set of services is going to cost a lot of money to build.

A ping server is a great idea, but in my experience from the blogging days they aren’t reliable and they add cost to the AMP implementation. Nice feature, worth supporting, but not reliable.

I think a crawler is what will work best, but it’s a ton of work to bootstrap and there’s no clear path to building a business on top of it, since Google has search advertising on lock and the publishers monetize themselves. There are paths out but they take time and involve a lot of risk.

--

--