Embedding arXiv Papers (and other PDFs) Easily in Astro
Displaying academic papers, especially from sources like arXiv, directly within a webpage can enhance the user experience on technical blogs. I wanted a straightforward way to embed these PDFs into my blog. My primary goal was to create a tool that would allow me to easily showcase papers when I write future posts discussing them – perhaps adding notes, detailing reproduction efforts, or providing analysis.
To achieve this, I developed a reusable Preact component that renders PDFs client-side using just the arXiv ID or a direct URL, ensuring seamless integration into my site’s content flow.
The Goal: Simple, Integrated PDF Embedding
The main objective was to embed PDFs, particularly arXiv papers, as simply as possible within .astro
or .mdx
files. Crucially, I wanted a solution that felt native to the blog’s design, rather than relying on a standard browser <iframe>
.
Why Not Just Use an <iframe>
?
While embedding a PDF using an <iframe>
pointing to the source URL is possible, it offers limited control over presentation and behavior. My goals required a more integrated approach:
- Minimal Design: I aimed for a viewer that fits aesthetically within my blog’s theme, without the default browser UI chrome that often comes with
<iframe>
PDF viewers. - Customization: Using
PDF.js
directly allows for finer control over rendering, such as applying theme-specific styles (like the dark mode adjustments I implemented) or potentially adding custom overlays or annotations in the future. - Avoiding Nested Contexts: Iframes create separate browsing contexts, which can sometimes complicate interactions or styling consistency.
By building a component around PDF.js
, I could achieve a lightweight, customizable viewer that feels like a natural part of the page.
The Solution: A Client-Side PDF Component
I built a Preact component (PDFViewer.tsx
) designed to run only in the browser. This is essential because PDF rendering libraries like pdfjs-dist
often rely on browser-specific APIs (like DOMMatrix
) unavailable during server-side rendering (SSR) in Astro. Using Astro’s client:only="preact"
directive ensures the component and its dependencies load only on the client side.
Usage Examples
Embedding a PDF is now straightforward.
1. Using an arXiv ID:
To display an arXiv paper, I just need its ID (e.g., the ResNet paper ID 1512.03385
):
import PDFViewer from "@/components/islands/common/PDFViewer.tsx"; // Renders the PDF for arXiv paper 1512.03385 <PDFViewer id="1512.03385" client:only="preact" />;
It looks like this:
- Using a Direct URL:
The component also accepts a direct URL to any PDF file.
import PDFViewer from "@/components/islands/common/PDFViewer.tsx"; // Renders the PDF from the specified URL <PDFViewer url="https://your-bucket.s3.amazonaws.com/path/to/document.pdf" client:only="preact" />;
Important Note on Direct URLs and CORS: When using the url prop with PDFs hosted on services like AWS S3 or other domains, you might encounter Cross-Origin Resource Sharing (CORS) issues. The browser’s security policy prevents JavaScript from fetching resources from a different origin unless that origin explicitly allows it via CORS headers. You must configure the hosting service (e.g., your S3 bucket) to send the appropriate Access-Control-Allow-Origin headers to permit your website’s domain to fetch the PDF file.
How It Works
- arXiv ID Handling: When an id is provided, the component constructs a local path like /papers/1512.03385. A server redirect (using Netlify redirects in my case) maps this path to the actual
https://arxiv.org/pdf/:id.pdf
URL. This fetches the paper directly from arXiv upon request, avoiding copyright issues and keeping component usage clean.
[[redirects]] from = "/papers/:id.pdf" to = "https://arxiv.org/pdf/:id.pdf" status = 200 force = true [[redirects]] from = "/papers/:id" to = "https://arxiv.org/pdf/:id.pdf" status = 200 force = true
- URL Handling: If a url is provided, it’s passed directly to PDF.js, subject to CORS rules mentioned above.
- Client-Side Rendering (
client:only="preact"
): This Astro directive prevents server-side rendering. The pdfjs-dist library is dynamically imported within a useEffect hook in the Preact component, ensuring it only runs in the browser. - Rendering: The component uses pdfjs-dist to load the PDF and render each page onto a
<canvas>
element. It includes loading/error states and adapts its appearance based on the site’s theme using CSS filters for dark mode.
Benefits
- Simplicity: Embedding requires just a single component tag.
- Flexibility: Works for arXiv papers and other hosted PDFs (with proper CORS).
- Integration: Provides a minimal viewer that matches the site’s design better than an iframe.
- Performance: Defers heavy PDF rendering to the client.
- Future-Proof: Enables richer interactions for future paper discussion posts.
Conclusion
This custom Preact component provides a clean and efficient way we found to embed PDF documents directly into our Astro site. It avoids the limitations of iframes, offering better design integration and paving the way for future posts where we can easily reference and discuss academic papers directly within the content. Remember to handle CORS settings appropriately when linking to externally hosted PDFs via direct URLs.