Converting CAD Polylines to GeoJSON: Production-Ready Python Pipeline

Converting CAD polylines to GeoJSON requires extracting vertex arrays from DXF/DWG entities, normalizing coordinate precision, and serializing them into RFC 7946-compliant FeatureCollection objects. The most reliable Python stack combines ezdxf for low-level DXF parsing, shapely for topology validation, and geopandas for CRS transformation and export. For proprietary DWG inputs, pre-convert to DXF using the ODA File Converter or GDAL’s DXF driver before execution. This pipeline bypasses vendor lock-in while preserving spatial integrity across AEC interoperability workflows.

Architecture & Format Constraints

CAD formats store geometry as parametric entities rather than explicit coordinate arrays. LWPOLYLINE (lightweight) stores vertices as flat (x, y) tuples with optional elevation flags, while legacy POLYLINE entities maintain explicit 3D vertex objects. Both require explicit closure detection (flags & 1) to distinguish between open linework and area boundaries.

When scaling this workflow into broader Geometry Mesh Conversion pipelines, maintain strict separation between parsing, validation, and serialization stages. This modular approach prevents memory bloat and enables targeted error recovery when processing multi-megabyte survey exports or BIM-derived site plans.

Core Pipeline Implementation

The foundation of any reliable Python Parsing & Geometry Extraction routine must handle mixed entity types, repair self-intersections, and enforce WGS84 ordering before export. The script below implements a production-grade extraction loop:

import ezdxf
import geopandas as gpd
from shapely.geometry import LineString, Polygon
from shapely.validation import make_valid
import sys

def cad_polylines_to_geojson(
    dxf_path: str,
    output_path: str,
    source_crs: str,
    target_crs: str = "EPSG:4326"
) -> None:
    """Extract CAD polylines, transform CRS, and export as valid GeoJSON."""
    try:
        doc = ezdxf.readfile(dxf_path)
    except Exception as e:
        raise RuntimeError(f"DXF parse failed: {e}") from e

    msp = doc.modelspace()
    geometries = []
    properties = []

    for entity in msp:
        if entity.dxftype() not in ("LWPOLYLINE", "POLYLINE"):
            continue

        try:
            # Extract vertices safely
            if entity.dxftype() == "LWPOLYLINE":
                pts = entity.get_points()
                points = [(p[0], p[1], p[2] if len(p) > 2 else 0.0) for p in pts]
            else:
                points = [(v.dxf.location.x, v.dxf.location.y, v.dxf.location.z) 
                          for v in entity.vertices]

            if len(points) < 2:
                continue

            # Determine closure state
            is_closed = bool(entity.dxf.get("flags", 0) & 1)

            if is_closed and len(points) >= 3:
                geom = Polygon(points)
            else:
                geom = LineString(points)

            # Repair topology
            if not geom.is_valid:
                geom = make_valid(geom)

            geometries.append(geom)
            properties.append({
                "layer": entity.dxf.layer,
                "cad_type": entity.dxftype(),
                "closed": is_closed,
                "handle": entity.dxf.handle,
                "vertex_count": len(points)
            })
        except Exception as e:
            print(f"Warning: Skipping {entity.dxf.handle}: {e}", file=sys.stderr)

    if not geometries:
        raise ValueError("No valid polylines extracted from DXF.")

    # Build, transform, and export
    gdf = gpd.GeoDataFrame(properties, geometry=geometries, crs=source_crs)
    if source_crs != target_crs:
        gdf = gdf.to_crs(target_crs)

    gdf.to_file(output_path, driver="GeoJSON")
    print(f"Exported {len(gdf)} features to {output_path}")

Execution Breakdown

1. Entity Parsing & Vertex Extraction

The parser filters the modelspace for LWPOLYLINE and POLYLINE types. Lightweight entities return compact coordinate tuples, while legacy polylines require iterating through .vertices. Both branches normalize to (x, y, z) triples to preserve survey-grade elevation data. Entities with fewer than two vertices are discarded as malformed.

2. Topology Validation

CAD drawings frequently contain self-intersecting boundaries, duplicate nodes, or unclosed loops labeled as closed. shapely.validation.make_valid() resolves these by splitting overlapping rings, collapsing zero-area polygons, and snapping floating-point drift. This step is mandatory before CRS transformation to prevent pyproj projection errors.

3. CRS Transformation

GeoJSON strictly requires WGS84 (EPSG:4326) with [longitude, latitude, altitude] axis ordering. geopandas.GeoDataFrame.to_crs() handles datum shifts, ellipsoid corrections, and axis reordering automatically. Always declare the source CRS explicitly; assuming EPSG:4326 for local state plane or UTM datasets will produce coordinates shifted by thousands of kilometers.

4. RFC 7946 Serialization

The to_file(driver="GeoJSON") method serializes the GeoDataFrame into a standards-compliant FeatureCollection. It strips internal pandas indices, formats numeric precision to 7 decimal places, and enforces the required type, features, and properties hierarchy. For compliance verification, validate outputs against the RFC 7946 GeoJSON Specification.

Handling Edge Cases & Scale

Challenge Mitigation Strategy
Floating-point drift Round coordinates to 6–8 decimals post-transformation. CAD tolerances rarely exceed 0.001m.
Massive DXF files (>500MB) Use ezdxf’s iterdxf() iterator or chunk by layer. Avoid loading entire modelspace into memory.
Mixed 2D/3D geometries GeoJSON supports 3D coordinates natively. Drop Z only if downstream GIS tools explicitly require 2D.
Attribute bloat Filter properties dict before GeoDataFrame creation. Export only layer, handle, and cad_type to keep payloads lean.
Invalid CRS strings Validate with pyproj.CRS.from_user_input() before transformation. Catch CRSError early.

For infrastructure platform teams integrating this into CI/CD pipelines, wrap the extraction function in a retry loop with exponential backoff. DXF files generated from third-party converters occasionally contain truncated entity tables or malformed binary headers. Logging skipped handles (entity.dxf.handle) enables rapid reconciliation with source CAD files.

When deploying at scale, consider parallelizing by layer or spatial partition. geopandas supports dask-geopandas for out-of-core operations, and pyproj thread-safety improves significantly in versions ≥3.4. Always benchmark transformation overhead against I/O latency; CRS conversion typically consumes 60–80% of total runtime on large datasets.

This pipeline delivers deterministic, standards-compliant outputs suitable for web mapping, spatial analysis, and automated compliance checking. By decoupling parsing from transformation and enforcing strict topology validation, teams eliminate format drift and maintain audit-ready geospatial assets across the AEC lifecycle.